wg3 database languages stephen cannan convenor 2002-05-06
TRANSCRIPT
WG3Database Languages
Stephen CannanConvenor
2002-05-06
S J Cannan2
Agenda
• Working Group changes
• ISO/IEC 9075
• ISO/IEC 20606
• Recap of recent changes
• Sketch of planned changes
S J Cannan3
Working Group changes
• 3 ex-WG5 subprojects absorbed– 20606-1 Authorization and Audit
– 20606-2 Distribution Schema
– 20606-3 Encompassing Transaction
• Effect on WG3 internal structure
S J Cannan4
ISO/IEC 9075• Multi-part standard
– Part 1: SQL/Framework
– Part 2: SQL/Foundation
– Part 3: SQL/CLI
– Part 4: SQL/PSM
– Part 9: SQL/MED
– Part 10: SQL/OLB
– Part 11: SQL/Schemata
– Part 13: SQL/JRT
– Part 14: SQL/XML
S J Cannan5
Part 1: SQL/Framework
• Common definitions & concepts
• Structure of multi-part standard
• Basic conformance structure & statement
• About 85 pages
S J Cannan6
Part 2: SQL/Foundation
• The “core” of the standard
• Includes:– Traditional SQL
– Object-oriented SQL
– Module Language
– Host language bindings (except Java)
– Dynamic SQL
– Direct SQL
• Excludes:– Information Schema & Definition Schema (in Part 11)
• About 1300 pages
S J Cannan7
Part 3: SQL/CLI
• Call-Level Interface
• Best-known implementation: ODBC
• About 400 pages
S J Cannan8
Part 4: SQL/PSM
• Persistent Stored Modules
• PSM-96 specified:– functions & procedures
– SQL-server modules
– computational completeness
• In PSM:1999, functions & procedures moved to Foundation (same in 2003)
• Analogous to PL/SQL, Transact-SQL, etc.
• About 170 pages
S J Cannan9
Part 9: SQL/MED
• Management of External Data
• Foreign Servers, Foreign-Data Wrappers, Foreign Tables
• SQL-aware vs non-SQL-aware
• Datalinks
• Merge with 20606-2 Distribution Schema
• About 500 pages
S J Cannan10
Part 10: SQL/OLB
• Object Language Bindings
• Embedded SQL in Java
• SQLJ Part 0
• About 360 pages
S J Cannan11
Part 11: SQL/Schemata
• Information Schema
• Definition Schema
• About 300 pages
S J Cannan12
Part 13: SQL/JRT
• Java Routines and Types
• Java routines stored in an SQL database, and invoked from SQL statements
• Java classes used as data types of SQL columns
• About 200 pages
S J Cannan13
Part 14: SQL/XML
• Under construction
• Using SQL and XML together
• About 150 pages, but still growing
S J Cannan14
ISO/IEC 20206
• 20606-1 Authorization and Audit
• 20606-3 Encompassing Transaction
S J Cannan15
Project 20206 –1Authorization and Audit
• Addresses issues of:– Authentication
– Access Control
– Transfer Integrity
– Transfer Confidentiality
– Non-repudiation
• No Working Draft available
• CD expected 2002-06-01
• John Hadjioannou listed as Editor but not present
S J Cannan16
Project 20206 –3Encompassing Transaction
• Interacts with external Transaction Managers
• Support for xa_ interface– 9075-6 XA Specialization (SQL/Transaction)
• Propagates Global Context information
• No Working Draft available
• CD expected 2002-06-01
• Bob Sunday listed as Editor but has withdrawn
S J Cannan17
Recap of recent changes: ISO/IEC 9075 SQL:2003
• Since 1999 we have already added:– Part 9 SQL/MED
– Part 10 SQL/OLB
– Part 13 SQL/JRT
– Addendum 1 SQL/OLAP
• SQL:2003 will be the 4th generation– 1987, 1992, 1999
• Consists of:– Many error corrections
(TC stands at 376 pages and growing)
– A few new features
– Part 14 SQL/XML
S J Cannan18
New Features
• Data types
• Predicates
• Semantics
• Security
• Active Database
S J Cannan19
New Data Types
• BIGINT
• MULTISET
S J Cannan20
New Data Types — BIGINT
• Exact numeric, scale 0
• Precision of BIGINT precision of INTEGER
• Must have same radix as SMALLINT and INTEGER
S J Cannan21
New Data Types — MULTISET
• Varying-length, unordered collections of element having specified type
• COL1 INTEGER MULTISET
• COL2 ROW( F1 BIGINT, F2 VARHCAR(4000) ) MULTISET
• No (specified) maximum cardinality
• SQL:2003 collection types include:– MULTISET
– ARRAY
S J Cannan22
New Data Types — MULTISET
• INTEGER MULTISET()– Empty multiset with integer element type (not null!)
• INTEGER MULTISET(2, 3, 5, 7)– Integer multiset with first few primes
• INTEGER MULTISET(SELECT COL1 FROM TBL1 WHERE COL2 > 10)
– Integer multiset populated from values in column of table
S J Cannan23
New Data Types — MULTISET
• CARDINALITY (value1)– Type of value1 must be multiset
– Returns number of elements in value
• SET (value1)– Type of value1 must be multiset
– Returns value1 with duplicate elements removed
• ELEMENT (value1)– Type of value1 must be multiset
– Cardinality of value1 must be 1
– Returns the single element in value1
S J Cannan24
New Data Types — MULTISET
• UNNEST(value1) AS corr-name– Type of value1 must be multiset
– “Un-nests” value1 and turns the elements into rows of a virtual table
– UNNEST MULTISET (2, 3, 5, 7) AS P
P
2
3
5
7
S J Cannan25
New Data Types — MULTISET
• value1 MULTISET setop quantifier value2– setop — UNION or EXCEPT or INTERSECT– quantifier — ALL or DISTINCT
• SELECT col1 MULTISET INTERSECT DISTINCT col2 FROM tbl1 WHERE CARDINALITY(col2) > 50
• Close analogs to ordinary set operators UNION, EXCEPT, and INTERSECT
• However, ALL is the default quantifier
S J Cannan26
New Data Types — MULTISET
• New aggregates:– COLLECT — Transform the values in a group into a multiset
– FUSION — Form a union of the multisets in a group — number of duplicates of a given value in the result is the sum of the number of duplicates in the multisets in the rows of the group
– INTERSECTION — Form an intersection of the multisets in a group — number of duplicates of a given value in the result is the minimum of the number of duplicates in the multisets in the rows of the group
S J Cannan27
New Predicates
• NORMALIZED– If character string argument is in Unicode Normalization Form
C (NFC), returns true
• MEMBER– If first argument is a member of multiset in second argument,
returns true
• SUBMULTISET– If first argument is a submultiset of second argument, returns
true
• SET– If argument is a set (no duplicate values), returns true
S J Cannan28
NORMALIZED Predicate
• string_value IS [ NOT ] NORMALIZED
• Character repertoire of string_value must be UCS
• Encourages, but does not force, normalization to NFC
S J Cannan29
MEMBER Predicate
• value [ NOT ] MEMBER [ OF ] multiset
• multiset must be a multiset
• Type of value comparable to element type of multiset
• If multiset is empty, returns false
• If value is equal to some element of multiset, returns true
• Else, if some element of multiset is null, returns unknown
S J Cannan30
SUBMULTISET Predicate
• multiset1 [ NOT ] SUBMULTISET [ OF ] multiset2
• Both multiset1 and multiset2 must be multisets…
• …and their element types must be comparable
• If the cardinalities of multiset1 and multiset2 are equal and if every value in multiset1 has a corresponding value in multiset2, then returns true
S J Cannan31
SET Predicate
• multiset IS [ NOT ] A SET
• multiset must be a multiset
• If there are no duplicate values in multiset, returns true
• Maximum of 1 null value in a set
S J Cannan32
New Semantics
• MERGE statement
• OLAP: TABLESAMPLE
• Generated columns
• Identity columns and sequence generators
S J Cannan33
New Semantics — MERGE
• If some row in the target table matches some row in the source table according to the specified predicate, then the row in the subject table is updated
• If no row in the target table matches a given row in the source table according to the specified predicate, then the row from the source table is inserted into the subject table
S J Cannan34
New Semantics — MERGE
• MERGE INTO table-name [ AS correlation ]USING table-referenceON search-conditionWHEN MATCHED THEN SET col = value
• MERGE INTO table-name [ AS correlation ]USING table-referenceON search-conditionWHEN NOT MATCHED THEN INSERT [ ( col-list ) ] VALUES ( val-list )
• Both MATCHED and NOT MATCHED clauses permitted once each, in either order
S J Cannan35
TABLESAMPLE
• New feature in the OLAP capability
• Permits evaluation of aggregates on samples derived from database data
• Permits faster debugging when database is huge
• Two forms of sampling: BERNOULLI and SYSTEM
S J Cannan36
TABLESAMPLE
• TABLESAMPLE method ( percentage ) [ repeatable ]
• method is either BERNOULLI or SYSTEM– Bernoulli: Sample table contains approximately percentage %
of the rows of the original table; the probability of a given row of the original table appearing in the sample table is percentage %, independently of every other row
– System: Sample table contains approximately percentage % of the rows of the original table; the probability of a given row of the original table appearing in the sample table is percentage %
• repeatable acts as a random number seed for sampling
S J Cannan37
Generated Columns
• Ordinary columns of base tables: base columns
• Generated column: value computed from values of zero or more base columns in same row
• CREATE TABLE EMPLOYEES ( EMP_ID INTEGER, SALARY DECIMAL(7,2), BONUS DECIMAL(7,2), TOTAL_COMP GENERATED ALWAYS AS ( SALARY + BONUS ), HR_CLERK GENERATED ALWAYS AS ( CURRENT_USER ))
S J Cannan38
Identity columns &Sequence generators
• Identity column: Invented as mechanism to allow automatic population of table keys
• Sequence generator: Invented to allow generation of “next” value in a sequence
• Together, they allow generation of keys for unique columns (or combinations of columns) when required
• Application chooses: select value for identity column or let sequence generator make choice
S J Cannan39
Sequence generators
• Sequence generators– Data type (exact numeric with scale zero)
– Starting value
– Increment (positive: ascending, negative: descending)
– Minimum and maximum values
– Cycle option
– External (explicit schema object) or internal (part of another schema object, such a column)
S J Cannan40
External sequence generators
• CREATE SEQUENCE seqname AS type START WITH value INCREMENT BY value MAXVALUE value CYCLE
• Variations:– NO CYCLE
– NO MAXVALUE, MINVALUE, NO MINVALUE
– Order of clauses can vary
Common sequence generator options}
S J Cannan41
Sequence generators
• Every sequence generator has a “current base value” — initially set to the start value
• Generate next value of a sequence generator: NEXT VALUE FOR seqname
• Returns current base value + N * increment for some N 0
– If computed value > MAXVALUE (or < MINVALUE) and NO CYCLE, then raise an exception
– Otherwise, reset to MINVALUE (or MAXVALUE) and compute new value for some N.
S J Cannan42
Sequence generators
• INSERT INTO TBL ( COL1, COL2 ) VALUES ( 10, NEXT VALUE FOR seqgen )
• CALL myproc ( NEXT VALUE FOR seqgen )
• SET J = J + NEXT VALUE FOR seqgen
S J Cannan43
Sequence generators
• ALTER SEQUENCE seqname– RESTART WITH newbase — resets start of
computation
– Common sequence generator options — change start value, maximum or minimum, increment, cycle option
• DROP SEQUENCE seqname dropoption
S J Cannan44
Identity columns &Sequence generators
• Base tables may optionally have a single identity column
• Data type: exact numeric, scale 0 — e.g., INTEGER
• Associated with internal sequence generator– Start value
– Minimum and/or maximum value
– Increment value
– Cycle option
S J Cannan45
Identity columns &Sequence generators
• GENERATED ALWAYS or GENERATED BY DEFAULT– ALWAYS — Not allowed to UPDATE column; INSERT requires OVERRIDING SYSTEM VALUE
– BY DEFAULT — Allowed to INSERT or UPDATE column; if column not specified on INSERT, then value automatically generated
S J Cannan46
Identity columns &Sequence generators
• CREATE TABLE employees ( EMP_ID INTEGER GENERATED ALWAYS AS IDENTITY START WITH 100 INCREMENT 1 MINVALUE 10 NO MAXVALUE NO CYCLE, SALARY DECIMAL(7,2), ...,)
S J Cannan47
New part – SQL/XML
• Out for FCD ballot
• Known not to be complete
• Current contents comprise infrastructure — mappings of…
– SQL identifiers to & from XML QNames
– SQL types & values onto XML schema types & values
– SQL tables onto XML schema types
– Publishing functions (SQL data expressed as XML)
– New built-in type for SQL: XML
S J Cannan48
To be added:Inserting XML into an SQL database
• Transform character string containing XML text into an XML value
• INSERT INTO employees ( empxml )VALUES XMLPARSE ( '<emp id="15339"><name>Gurney Halleck</name><hiredate>3998-04-25</hiredate><dept>Security</dept></emp>'STRIP WHITESPACE )
• No concrete proposal yet (still working out issues)
S J Cannan49
To be added:Inserting XML into an SQL database
• XMLPARSE does not validate
• But it does:– Syntax check all in-line DTDs
– Process entity references defined in such DTDs
– Apply default values defined in such DTDs
• Whitespace handling not finalized:– STRIP and
– PRESERVE are possible choices
S J Cannan50
To be added: Validating XML in an SQL context
• Validate an XML value against an XML schema (possibly a DTD)
• INSERT INTO employees ( empxml )VALUES XMLVALIDATE ( XMLPARSE ( '<emp id="15339"><name>Gurney Halleck</name><hiredate>3998-04-25</hiredate><dept>Security</dept></emp>' STRIP WHITESPACE ) )
• No concrete proposal yet (still working out issues)
S J Cannan51
To be added: Validating XML in an SQL context
• Many remaining problems:– How to reference DTDs and schemas
– Security problems of referencing arbitrary DTDs and schemas
– Namespace considerations
• xsi:schemaLocation and xsi:noNameSpaceSchemaLocation might identify schemas outside the database — a solution might involve caching schemas inside the database
S J Cannan52
To be added: Retrieving XML from an SQL database
• Transform an XML value into a character string containing XML text
• SELECT XMLSERIALIZE ( e.empxml )INTO :hostvarFROM emp AS eWHERE ...
• No concrete proposal yet (still working out issues)
S J Cannan53
To be added:Retrieving XML from an SQL database• Numerous problems not yet resolved:
– Precise SQL type of the function result (CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT)
– Actual character encoding (UTF8, UTF16, other) — and SQL’s specified character set
– Specified character encoding in prologue
– DTD or schema inference? Creation?
S J Cannan54
Summary
• ISO/IEC 9075– Consolidation of changes since 1999– Correction of errors– New Part (SQL/XML)
• ISO/IEC 20206– Area of concern– Looking for:
– National Body participation
– Editor(s)
S J Cannan55
Questions?