Download - AM18 ASA INTERNALS: DATA MANAGEMENT
AM18ASA INTERNALS: DATA MANAGEMENT
GLENN PAULLEY, DEVELOPMENT [email protected] 2005
2
Goals of this presentation
Overview of data management and query processing in Adaptive Server Anywhere 9.0.2
Concentrate on performance issues and problem areas
Provide an overview of SQL Anywhere 9.0 technology
Highlight planned features for the Jasper release
Agenda
Section One: SQL language support, data management
Section Two: query execution and optimization
3
Design goals of SQL Anywhere Studio
Ease of administration
Good out-of-the-box performance
“Embeddability” features self-tuning
Cross-platform support
Interoperability
4
Motivation for the ASA 9.0 release
Exploit the new architecture of 8.0 and add support for additional language features, including GROUP BY ROLLUP
RECURSIVE UNION
Window functions and other OLAP support
XML
Table Functions
INTERSECT and EXCEPT
ORDER BY, SELECT TOP N in any query block, including views
Improve performance
5
Highlights of the ASA 9.0 releases
HTTP serverASA Index ConsultantImproved performance, scalability better scalability in OLTP environments
Query processing improvements optimization refinements – particularly with the server’s cost model
histograms modified according to update DML statements
alternate, efficient execution methods for complex queries
SNMP support 9.0.1 EBF build 1828, Windows platforms only
Formally part of the 9.0.2 release
6
Performance, performance, performance
Version comparison, 10GB DB, Minutes
-1.0
1.0
3.0
5.0
7.0
9.0
11.0
13.0
15.0
7.0.4.2788 14.6 1.1 1068. 20.7 52.8 1.0 515.2 90.2 825.1 29.1 16.1 12.8 177.8 3.8 1.2 2.9 8.3 227.3 1500. 1500. 1500. 1500. 412.2
8.0.0.2065 7.7 1.0 8.1 6.8 7.9 2.7 672.7 9.2 717.9 13.6 1.9 6.5 13.5 2.5 4.9 5.2 6.0 1500. 1500. 1500. 1500. 1500. 408.6
9.0.0.1073 4.6 2.6 3.1 2.4 3.3 1.0 3.2 3.4 6.2 3.5 0.7 2.4 3.7 0.3 0.5 2.6 4.7 14.1 3.2 1.5 8.9 0.9 3.5
9.0.1.1751 4.2 0.7 5.7 1.9 2.8 1.2 3.3 2.9 4.7 2.5 0.5 1.9 1.5 0.4 1.5 1.5 2.2 6.7 2.3 1.9 6.6 0.7 2.6
10.0.1212 3.8 0.6 2.2 1.7 2.4 1.0 2.9 2.5 4.2 2.0 0.5 1.8 1.5 0.3 0.6 1.2 1.4 4.5 1.9 1.7 5.8 1.1 2.1
Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Avg
7
Contents
Language Support
New SQL constructs supported with 9.0.1
Data Management in 9.0.1
Database organization
Table storage organization
Index storage organization
Physical database design tips
Jasper features
8
New SQL language support in 9.0.1
Table functions (SELECT over a stored procedure)
ORDER BY clause now supported in all SELECT blocks
Necessary to support SELECT TOP n in derived tables, views, and subqueries with correct semantics
RECURSIVE UNION (bill-of-materials) queries
INTERSECT and EXCEPT query expressions
LATERAL keyword for derived tables
Now necessary for derived tables or table expressions containing outer references
WITH clause (common table expressions)
Essentially in-lined view definitions
9
New SQL language support in 9.0.1
SELECT TOP n START AT m
Equivalent functionality to that in MySQL, Postgres n and m can be variables or host variables
WITH INDEX hint in FROM clause
Named CHECK, PK, FK, UNIQUE constraints
Constraint violation message refers to the constraint name New catalog tables:
SYSCONSTRAINT contains information about all constraints, even referential integrity constraints
SYSCHECK contains the body of the CHECK constraint; now permit multiple CHECK constraints on the same column(s) Specific CHECK constraint that is violated appears in error
Not available in older database formats, even if DBUPGRAD is used
10
New SQL language support in 9.0.1
OLAP support
VARIANCE, STD_DEV aggregate functions ORDER BY clause for LIST aggregate function GROUP BY
ROLLUP, CUBE, GROUPING SETS Binary set functions (linear regression, co-variance, etc.) Rank functions Windowed aggregate functions
Construct “moving average” results in a single SQL statement
Support for multiple DISTINCT aggregate functions in a single SELECT block
Necessitates the use of Hash Group By
11
New SQL language support in 9.0.1
Support for SET statement in Transact-SQL dialect stored procedures
Implemented for MS SQL Server compatibility EXECUTE IMMEDIATE extensions
Procedures can now use EXECUTE IMMEDIATE to execute dynamically-constructed queries which return a result set
WITH ESCAPES ON | OFF WITH QUOTES ON | OFF
Variable assignment permitted in UPDATE statements (8.0.1)
SELECT INTO base-table
12
New SQL language support in 9.0.1
FOR XML AUTO, FOR XML RAW, FOR XML EXPLICIT, OPENXML procedure (supports XPATH queries over XML column values)
SQLX functionality: xmlelement(), xmlforest(), xmlgen(), xmlconcat(), and xmlagg()
EXPRTYPE() function – outputs the type of the expression argument
Useful when defining computed columns LOCATE() can handle negative offsets
INSERT WITH AUTO NAME (8.0.2)
13
Table functions
Result set description determined from the catalog; result set must match exactly
Otherwise SQLSTATE ‘WP012’ Workaround: use the WITH clause to annotate the procedure
reference in the FROM clause:
SELECT * FROM PROC() WITH( X Integer, Y char(17) )
SELECT * FROM SYS.SYSTABLE as st, sa_table_fragmentation() as tbfrg WHERE st.table_name = tbfrg.tablename
14
Table functions
Procedure may return only one result set
Statistics regarding cost, result set cardinality of the procedure are captured at run time; used for subsequent requests
Statistics are stored in SYS.SYSPROCEDURE Minimally requires DBUPGRAD of older databases to 9.0.0
15
Recursive UNION
SQL-2003 implementation of recursive (bill-of-materials) queries
Only DB2 also offers RECURSIVE UNION support; Oracle implements a ‘cycle’ clause
Uses specialized join operators: recursive hash inner and outer joins will utilize a nested-loop strategy if inputs are small; done adaptively at
run-time during query execution
WITH RECURSIVE r (level, emp_id, manager_id) as (SELECT 1, emp_id, manager_id FROM employee WHERE emp_id = manager_id UNION ALLSELECT level+1, e.emp_id, e.manager_id FROM employee e JOIN r ON (e.manager_id = r.emp_id)WHERE e.emp_id <> e.manager_id and level < 3)SELECT * FROM r
16
Recursive UNION: restrictions
Query expression must be UNION ALL Recursive reference must be in a query block that does not
contain DISTINCT, aggregation, or an ORDER BY clause Recursive reference in a LEFT OUTER JOIN is permitted
Schema of WITH clause must match recursive query Implicit type conversions involving truncation can yield undesired
results; SQLSTATE 42WA2 returned if server detects a type mismatch
Use CAST to ensure compatible types Infinite queries are possible; server kills the query after N
recursions controlled by the new connection option
MAX_RECURSIVE_ITERATIONS (default 100)
17
INTERSECT and EXCEPT
Implement set/bag difference and set/bag intersection
Both ALL and DISTINCT variants are supported; DISTINCT performed by default
Form query expressions in the same fashion as UNION
NULL treated as a special value in each domain, hence NULLs are equivalent to each other
Useful when formulating queries that require counting of identical rows
See the help for order-of-precedence amongst the set operators
18
EXCEPT and INTERSECT ALL
Rewrite to transform ALL to DISTINCT done automatically by the optimizer
Both EXCEPT and INTERSECT can be computed through either a merge or hashing technique
Also supports an (expensive) nested-loop strategy in case a cache shortage is encountered
With ALL variants:
implicitly performs aggregation to count the number of duplicate rows in each input
A new query execution operator, ROW REPLICATE, generates the required copies of each row
SELECT description FROM product EXCEPT ALLSELECT description FROM product as p2 WHERE quantity < 15
19
GROUP BY ROLLUP
Computes aggregates as usual, but result set contains multiple sets of groups
Logically, grouping is performed N+1 times for N grouping expressions
Essentially implements the functionality of COBOL Report Writer in a single SQL request
SELECT state, zip, count(*), grouping(zip), grouping(state)FROM customerGROUP BY ROLLUP (state, zip)
20
GROUP BY CUBE
Computes aggregates as usual, but result set contains the power set of the N grouping expressions
Expensive to execute for large N
Result can be restricted through the specification of GROUPING SETS
SELECT state, zip, count(*), grouping(zip), grouping(state)FROM customerGROUP BY CUBE (state, zip)
SELECT state, zip, count(*), grouping(zip), grouping(state)FROM customerGROUP BY GROUPING SETS ( (state, zip), state, zip, () )
21
WINDOW functions
Part of SQL OLAP extensions
Computes aggregates (except LIST) over a window of rows
Provides an ANSI-compliant way to number the rows of a result set
ROW_NUMBER() rather than NUMBER(*) Useful to:
Compute cumulative aggregates, or “moving averages” Eliminate the need for correlated subqueries involving aggregation
22
WINDOW functions
List employees, by department, in four US states by their start dates, along with their cumulative salaries:
SELECT dept_id, emp_lname, start_date, salary, SUM(salary) OVER (PARTITION BY dept_id ORDER BY start_date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS “Sum_Salary" FROM employee WHERE state IN ('CA', 'UT', 'NY', 'AZ') AND dept_id IN ('100', '200')ORDER BY dept_id, start_date;
23
WINDOW functions
List all orders (with part information) where the part quantity cannot cover the maximum single order for that part:
SELECT order_qty.id, o.order_date, p.*, max_qFROM ( SELECT s.id, s.prod_id, MAX(s.quantity) OVER (partition BY s.prod_id order by s.prod_id) AS max_q FROM sales_order_items s) as order_qty, product p, sales_order oWHERE p.id = prod_id and o.id = order_qty.id and p.quantity < max_qORDER BY p.id, o.id
SELECT o.id, o.order_date, p.*FROM sales_order o, sales_order_items s, product pWHERE o.id = s.id and s.prod_id = p.id and p.quantity < (SELECT max(s2.quantity) FROM sales_order_items s2 WHERE s2.prod_id = p.id)ORDER BY p.id, o.id
24
WINDOW functions
Find the salespeople with the best sales (total amount) for each product, including ties:
SELECT v.prod_id, v.sales_rep, v.total_quantity, v.total_sales FROM ( SELECT o.sales_rep, s.prod_id, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_sales, RANK() OVER (PARTITION BY s.prod_id ORDER BY SUM(s.quantity * p.unit_price) DESC) as sales_rankingFROM sales_order o KEY JOIN sales_order_items s KEY JOIN product pGROUP BY o.sales_rep, s.prod_id ) as vWHERE sales_ranking = 1ORDER by v.prod_id
SELECT s.prod_id, o.sales_rep, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_salesFROM sales_order o KEY JOIN sales_order_items s KEY JOIN product pGROUP BY s.prod_id, o.sales_repHAVING total_sales = (SELECT FIRST SUM(s2.quantity * p2.unit_price) as sum_sales FROM sales_order o2 KEY JOIN sales_order_items s2 KEY JOIN product p2 WHERE s2.prod_id = s.prod_id GROUP BY o2.sales_rep ORDER BY sum_sales DESC )ORDER BY s.prod_id
25
Data Management in 9.0.2
26
Moving to ASA 9.0.2
If database is 8.0.2, unload/reload to 9.0 is largely unnecessary
DBUPGRAD to 9.0 required for some catalog schema changes, in particular for the Index Consultant
There should be no consequences of using DBUPGRAD with respect to performance
However:
only 9.0 format databases support named constraints only 9.0 format databases support cache warming only 9.0.1 databases support page checksums 8.0.2 databases do not support index statistics collection by
default Can be turned on when creating the database via CREATE
DATABASE (but not dbinit)
27
Moving to ASA 9.0.2
Otherwise, unload/reload from 8.0.1 or 8.0.0 recommended Clustered index support Better statistics management
Improved histogram organization, statistics collection Index statistics kept persistent in the database file
Improved histograms Cache warming on startup Checksums on database pages PCTFREE option for base and temporary tables
28
Moving to SQL Anywhere “Jasper”
The Jasper release of the SQL Anywhere server will not support older database formats
Jasper will ship with a migration tool to convert an existing database into a Jasper-format database
29
Database organization
A database consists of up to 13 “dbspaces” Maximum size of each dbspace is limited by the underlying operating
system Maximum database size is also determined by page size
Limit for any dbspace is 2**28 (256 million) pages Each dbspace, the temporary file, and the transaction log is a simple
OS file Ease of administration, backup Temporary file is used for temporary tables
A dbspace file grows in 256K extents (512K if 16K pages, 1Mb if 32K pages)
Database files can be copied to/from different endian machines Can copy database from Wintel to big-endian UNIX systems and back
again Server automatically does data conversion where necessary
30
Database organization
A database file contains:
table pages index pages free pages rollback log pages checkpoint log pages
Each dbspace for a database must use the same page size
31
Physical organization: tables
Each table uses an independent set of table pages
Each table allocates at least one page, even if the table is empty Server maintains bit-maps for table pages
Supports clustering of table pages in the same portion of the database file
Facilitates large-block I/O – SQL Anywhere reads 64K at a time when doing sequential scans
Result: considerably faster sequential scan performance
32
Physical organization: tables
New in 8.0.2: ‘scattered read’ support on Windows 2000 and Windows XP Another mainframe technology being reinvented on PC/UNIX
servers aka “locate-mode I/O”
Improves performance, reduces memory requirements Coming to other platforms as vendors implement it
Tables cannot span dbspaces Each secondary index on a table can be stored in a separate
dbspace Recommended if multiple spindles are available (not necessary
for RAID devices) Partition dbspaces on separate devices whenever possible
Brings more disk arms to bear, reducing seek latency
33
Physical organization: tables
Rows are inserted into pages at a point where, if at all possible, the entire row can be stored contiguously
Caveat: row segments are at most 4K; second or subsequent row segments can appear on different pages
Columns are packed tightly together; only unpadded values are stored on disk
Primary key columns are always at the beginning of each row, in sequence
Server may rewrite all rows if PK added or modified Rows can be of (almost) unlimited size; are split across pages
where necessary
Maximum length of any column is 2Gb Maximum number of rows per page is 255
34
Physical organization: tables
Rows are not guaranteed to be placed in pages corresponding to their insertion order
By default, ASA uses a first-fit algorithm for page selection To guarantee ordering of a result set, specify an ORDER BY clause
Space is not reserved for columns that are null
BLOB values are stored in a separate “arena” of pages
First 255 bytes are stored together with the row Access to the rest of the BLOB value will almost certainly require a SEEK Implications for choice of page size
Once inserted, a row identifier is immutable
An updated row must be split if its new length does not allow it to fit on the page
35
Physical organization: tables
Table pages are allocated in 8 page clusters; cluster allocation depends on page size
2K: grow 4 clusters at a time 4K: grow 2 clusters at a time All other page sizes: one cluster at a time
ASA will re-use database pages for additional inserts if entire pages are freed
Defaults: for 1K pages, free space is 100 bytes; all other page sizes is 200 bytes
DBA can specify freespace percentage to accommodate future table UPDATEs using PCTFREE
PCTFREE characteristic stored in new catalog table SYSATTRIBUTE (and corresponding table SYSATTRIBUTENAME)
Can be specified for temporary tables
36
Page sizes
Page sizes supported are 1K, 2K, 4K, 8K, 16K, 32K 2K page size minimum on all UNIX platforms
Default changed to 2K in the 6.0.3 release A server can support several databases concurrently
Buffer pool page size will be the largest database page size specified on the command line
Consider tradeoffs with your choice of page size 4K recommended; occasionally 8K may offer improved
performance Default will change to 4K with Jasper release
Do not use 16K or 32K pages unless you have a specialty application In typical environments, large page sizes cause inefficient use of cache
37
Choice of page size does matter
Larger rows usually require larger pages (requires fewer split rows)
Random retrieval performance is dependent on the application
Larger pages can pollute the cache with unnecessary data Often require larger buffer pools to accommodate the
application’s working set Smaller pages are more cache efficient, but Smaller pages reduce index fanout, and can increase index depth
38
Choice of page size does matter
Don’t ignore index maintenance costs when considering page size (larger page sizes can mean increased cache pressure)
Test your application with different alternatives
Your mileage may vary A 4K page size is a typical choice for many applications My recommendation: use 4K pages unless thorough testing
proves that a different page size offers better performance/scalability
See data storage whitepaper
Available at www.ianywhere.com/developer Recently updated for 9.0.0
39
Physical organization: indexes
ASA 9.0 supports two different types of indexes:
Hash-based
Key is a one-way order-preserving encoding of at most nine bytes of the data values
Hash-based indexes are still used when the key length does not satisfy the limits for compressed indexes
Compressed
Contains Patricia tries in the index’s internal nodes Used for keys > 10 bytes and less than
122 bytes with 1K pages 248 bytes for all other page sizes
Substantially improved performance with larger keys
40
Physical index organization: hash-based indexes
Values in an index are “hashed” into a key of at most 10 bytes using an order-preserving encoding function
WITH HASH SIZE is deprecated Each indexed column encoded separately, with a one-byte
length
A 10-byte hash value can hold two 32-bit integer values (including two length bytes)
Hash values in an index are stored separately from the index entry itself
The hash value for an identical secondary key is shared for each index entry (row) in that index page
This improves fanout when data distribution is skewed
41
Physical index organization: Compressed indexes
Internal nodes in the index contain a Patricia trie
PATRICIA: Practical Algorithm to Retrieve Information Coded in Alphanumeric (D. R. Morrison, J. ACM Vol. 15, 1968)
Combines a binary trie with an optimization to skip over bit comparisons that would result from one-way branching
Result: automatic compression of string data
Excellent fanout of internal nodes
Common substrings of key values have a negligible impact on space requirements and performance
Superb performance improvements in many cases, especially with composite primary and foreign keys
42
Clustered index support
First offered with the 8.0.2 release
At most one clustered index per table (may be a temporary table)
May be secondary index, PK, FK, UNIQUE constraint
Optimizer assumes PK indexes are clustered unless a different clustering index exists
Engine will not attempt to maintain clustering on PK indexes unless they are declared CLUSTERED
May be hash or compressed index
Clustering characteristic stored in SYSATTRIBUTE catalog table
CLUSTERED keyword can be used in both CREATE INDEX and CREATE/ALTER TABLE statements
However, ALTER does not reorganize the table; use REORGANIZE TABLE
43
Clustered index support
On INSERT/LOAD TABLE, server attempts to keep rows physically adjacent in base table pages Specification of PCTFREE on LOAD can be critical
Adjacency is NOT guaranteed; ORDER BY still requires a physical sort or indexed retrieval
Can significantly improve performance Optimizer costs clustered index access differently
Consider their use with queries that involve range predicates Often useful with DATE or TIMESTAMP columns
Use REORGANIZE TABLE or UNLOAD/RELOAD if clustering degrades over time
ALTER INDEX statement can rename an index or change its clustering attribute
44
Physical index organization: fanout and page size
Fanout refers to the number of index entries on a page
Lower fanout means greater index depth, and hence more costly random retrieval
Fanout is affected by
Page size Hash value size/trie compression Distribution of key values Index maintenance
Fanout can degrade over time
sa_index_density() procedure
45
Indexes and query processing
ASA does not store actual data values in the index
implies each base row must be retrieved to Fetch the values of any attributes, or To compare keys longer than the maximum hash value size
Indexes are automatically created to enforce referential integrity
Primary keys, foreign keys, unique constraints All related indexes must be the same type (hash or compressed)
Maximum number of indexes is dependent on page size
<= 4K: 2048 indexes 8K: 1024 indexes 16K: 512 indexes 32K: 256 indexes
46
Indexes and query processing
Each indexed column can be ascending or descending
Index is scanned backwards if the application scrolls in the opposite direction, or an ORDER BY clause specifies the reverse sequence
Support for merge and hash joins means that ASA will often use sequential scans, rather than indexed retrieval
47
REORGANIZE Statement – base tables
REORGANIZE TABLE tablename
Defragments rows on-the-fly by removing/inserting groups of rows in clustered index (or PK) order
Exclusive lock held on the table while a group is processed; commits occur periodically to enable other applications to run, checkpoints are suspended while the group is being processed
Performs implicit COMMITs during operation Rows will be in clustered sequence when operation is
complete (except possibly concurrent UPDATES)
Use new procedure sa_table_fragmentation() to discover tables that warrant reorganization
48
REORGANIZE Statement - indexes
REORGANIZE TABLE tablename [ index specification ]
INDEX indexname FOREIGN KEY indexname PRIMARY KEY
Exclusive lock is held throughout
CHECKPOINTs are suspended
Reclaims space lost to update activity
Re-balances the index, especially important after many DELETE operations
Use the new procedure sa_index_density() to identify indexes that require reorganization
49
Data management improvements in 9.0.1
Better scalability – new lock-free cache manager
Substantially better performance across the board Support for page checksums
New option for dbinit and CREATE DATABASE statement Supported by dbvalid utility, and a new statement VALIDATE
CHECKSUM Overhead: largely depends on CPU speed. Examples:
2.8 milliseconds per I/O for 32K pages 0.7 milliseconds per I/O for 8K pages
Improvements to dynamic cache sizing
Sampling rate changes with database growth or the starting of a new database on the same server
50
Data management improvements in 9.0.1
Database cache warming feature
Two operational phases, collection and reload During collection, page IDs are saved in the database as they are
accessed at startup During reload, collected page IDs are read into cache as
background processing Checks and balances used to prevent swamping the server with I/O
during server startup Need to test performance before deploying
Cache warming is *enabled* by default
51
Data management improvements in 9.0.1
Optimistic locking introduced for WAIT_FOR_COMMIT Controlled by a new connection option
OPTIMISTIC_WAIT_FOR_COMMIT Temporary dbspace can be grown with ALTER DBSPACE
Can improve performance of complex queries by ensuring that the temp file is not fragmented on disk
Size of temporary dbspace can be controlled with a governor New public option TEMP_SPACE_LIMIT_CHECK (default OFF)
When OFF, engine’s default behaviour is to die with a DISK FULL error Jasper release: default is ON
Server computes a temp space quota for each request; if quota is exceeded and temporary dbspace is at least 80% of its maximum size, request fails with SQLSTATE 54W05
Quota computed using amount of disk free space on that partition, and number of active connections
Shipped in 9.0.0 build 1308, 9.0.1 build 1872, 8.0.3 build 4991
52
Data management improvements in 9.0.1
ALTER INDEX statement
Can rename an index, or alter its clustering attribute Ability to create an index on a function
Automatically adds a computed column “column-name” to the table
Creates an index on the computed column Relies on the optimizer to replace any function occurrences with
the computed column
CREATE INDEX index-name ON [owner.]table-name ( function( arg [, ...] ) AS column-
name ) [{IN | ON} dbspace-name]
53
Data management improvements in 9.0.1
Non-transactional temporary tables
Unaffected by COMMIT or ROLLBACK; no entries made to rollback log
Procedure, trigger, and view text can be hidden from other users by using SET HIDDEN (8.0.2)
LOAD TABLE enhancements:
can be used on local temporary tables (8.0.2) ORDER clause (8.0.2) Control over which column histograms are built (9.0.0)
54
Data management improvements in 9.0.1
DEDICATED_TASK option (DBA-only, temporary only) UUIDs and GUIDs can be used as surrogate keys - see
newid() function (8.0.2) XML data type SYSHISTORY system table Statistics (depth, leaf pages) maintained on indexes in real
time (introduced in 8.0.2EBF) Hash(), compress(), encrypt() builtin functions
Can be used to compress or encrypt individual string or binary fields in the database
Values can be viewed, processed with decrypt() and decompress() functions
55
Data management improvements in 9.0.1
ALTER DATABASE can now modify transaction log identically to DBLOG utility
BACKUP and DBBACKUP can now rename the log copy ALTER VIEW WITH RECOMPILE Event handling improvements:
Two new parameters for event_parameter: APPINFO DisconnectReason: ‘from client’, ‘drop connection’, ‘liveness’, ‘inactive’,
‘connect failed’ New cost model for Ultralite requests
New DTT function based on analysis of several current models of pocket PC devices
Equates random and sequential I/O to produce better Ultralite query plans
56
Data management improvements in 9.0.2
Temporary stored procedures they are visible only by the connection which creates them, and
are automatically dropped when the connection is dropped. they can be explicitly dropped, but may not be ALTERed. GRANT and REVOKE are not permitted on temporary
procedures. they are not recorded in the catalog or in the transaction log they can be created and dropped when connected to a read-only
database a procedure owner cannot be specified for temporary procedures.
Rather, they are owned by the user that creates them. temporary external procedures are not permitted temporary procedures execute with the permissions of their
creator (i.e. the current user)
57
Data management improvements in 9.0.2
CREATE LOCAL TEMPORARY TABLE defines a local temporary table which will persist until the end of a connection, or
until the table is explicitly dropped. Intended for use inside procedures, functions, triggers
Similar to DECLARE LOCAL TEMPORARY table if executed outside of a procedure context
UUIDs are now a native SQL Anywhere type UUID_HAS_HYPHENS option
Controls formatting of UUIDs (UniqueIdentifier values) when converted to strings Disk-full callback support MIN_TABLE_SIZE_FOR_HISTOGRAM is deprecated New option COLLECT_STATISTICS_ON_DML_UPDATES New option LOG_DEADLOCKS, sa_report_deadlocks() procedure Enhancements to START DATABASE statement: WITH DISTINCT
SQLSTATE
58
Application profiling improvements in 9.0.2
Procedure profiling can now be performed for an individual connection or user
call sa_server_option('Profile_connection',<connection-id>) call sa_server_option('ProfileFilterUser','<userid>')
Request-level logging enhancements:
New –zn switch to retain n log files in a ring Or use sa_server_option('RequestLogNumFiles',<n>)
Can log either text or the plan for expensive queries (9.0.2EBF) -zx <cost> specifies the threshold cost, which if exceeded at either
optimization or execution time the statement is logged Call sa_server_option(‘LogExpensiveQueries’)
When –zp is also specified, the plans are output; otherwise, only the statement text is logged
59
Physical database design tips
60
Physical database design tips: file placement
Database file placement
Place transaction log, database file(s), and temporary directory on separate devices if possible
if using mirrored logging, ensure the two logs are on different physical disks
Temporary file placement can dramatically affect performance of complex queries
Use the ASTMP environment variable to specify location for temporary file
Place on a different physical drive if possible The more disk heads the better (RAID)
61
Physical database design tips: file placement
Consider the use of caching disk controllers/NT striping/RAID
Consider the tradeoffs Software striping offers better performance, but offers no recovery
advantages RAID 5 tends to have poor write request latency: each I/O turns
into four write requests that take place serially Not good for a transaction log
RAID 10 (1+0) offers much better performance, at the cost of redundancy
62
File system considerations
Defragment your file system occasionally, especially after an unload/reload
Database file fragmentation is now displayed in the console window when the database is started
Preallocate large quantities of space in contiguous chunks through the ALTER DBSPACE command
Less problematic with 256K block allocation in recent ASA releases ALTER DBSPACE <dbspace-name> INSERT nnn {PAGES | KB | MB |
GB | TB} Can also do this for the TEMPORARY DBSPACE
Use db_extended_property() function to determine fragmentation/size of each dbspace individually (new in 9.0, also in 8.0.2.4215)
Can be done for temporary dbspace and the transaction log as well
63
File system considerations
Use caution when trying to run the database over a networked drive!
Not all networks and/or operating systems guarantee network packet ordering Physical or logical corruption is likely
Can use “-r” (read-only) switch if necessary SAN units are supported; they guarantee consistent semantics
Do not use cached filesystem writes unless persistence is guaranteed
Corruption is virtually certain and database cannot be recovered; will need to restore database from backup
64
Database fragmentation
ASA databases never shrink
Free pages will be reused for other purposes Unload/reload will recover this unused space
If data is removed in the order it was inserted, fragmentation is less likely
Avoid inserts of NULL values followed by updates with actual data use PCTFREE if necessary
Repair fragmentation with unload/reload, or REORGANIZE TABLE
Useful tools
DBINFO -u stored procedure sa_table_fragmentation()
65
Physical database design tips: tables
Load table data in clustering order (by default, primary key sequence)
Sorting automatically performed by DBUNLOAD and by the REORGANIZE TABLE statement
New ORDER syntax for (UN)LOAD TABLE Use 4K pages unless conditions warrant
Watch for ordering, placement of PK columns
Order in table dictates order in index Changed in Jasper!
Rows are rewritten if PK columns, or column order, is changed
66
Physical database design tips: tables
Use of out-of-range default values instead of NULL
Reduces page fragmentation with updates Can use PCTFREE as an alternative
Put large columns at end of row; fixed-size and frequently-accessed columns near start
Prevent seeks to another table page, required to access split rows
Choose your data types with care; tradeoff storage efficiency with application requirements
For keys, alphanumeric strings are often more flexible
67
Physical database design tips: indexes
Compressed indexes prevent many of the problems with relatively large or composite primary keys
However:
Surrogate keys can still be useful Usually not a good idea for significant business objects to have
the same key format Self-checking keys can simplify business processing
Watch for opportunities to specify a clustering index
Especially with date or timestamp columns used in range queries Useful stored procedures:
sa_index_levels() sa_index_density()
68
Physical database design tips: surrogate keys
Consider surrogate keys when appropriate
Exploit autoincrement support, or develop self-checking keys to simplify error detection
9.0 and 8.0.2 support automatic generation of universal unique identifiers (UUIDs) as surrogate keys
Compatible with Microsoft’s implementation New native domain: uniqueidentifier in 9.0.2 No longer necessary to use string conversion functions such as
uuidtostr(); type conversion done automatically Tradeoff their characteristics with GLOBAL
AUTOINCREMENT
69
Physical database design tips: foreign keys
Foreign keys are essential to the optimization of complex queries
Join selectivity and cardinality estimation is much more accurate when foreign key constraints are present
Also enable a variety of query rewrite optimizations But tradeoff using declarative referential integrity
Downside is the maintenance cost for indexes that are not utilized in query processing
In rare situations, consider eliminating some RI and check constraints once application is fully tested
70
Physical database design tips: triggers, constraints
Use declarative referential integrity instead of triggers
Use CHECK constraints rather than triggers for simple conditions
9.0 supports named constraints Unnamed constraints are automatically named as ‘ASAnnn’
Mark columns as NOT NULL when appropriate
Don’t over-use CHECK constraints
e.g. in user-defined data types Using a user-defined function in a CHECK constraint will
guarantee poor update performance
71
Server configuration tips: cache size
Dynamic cache sizing is instituted by default on platforms that support it Not supported for CE, Netware Can override dynamic cache sizing as necessary Server can dynamically adjust cache size depending on server workload;
this is more robust in 9.0.1 Use –ch to specify an upper bound larger than 256MB
If specifying cache size at startup: Need to allow for OS and application overhead CE has different defaults than other platforms Java-enabled databases require a larger minimum cache for the Java VM
- 8Mb usually sufficient Watch for NT File Cache competition
See white paper on memory usage (available at http://www.ianywhere.com/developer)
72
Data management in Jasper
Statements concerning iAnywhere Solutions' new products are forward-looking statements that involve a number of uncertaintiesand risks and cannot be guaranteed. Factors that could ultimately affect such statements are detailed from time to time in Sybase's Securities and Exchange Commission filings, including but notlimited to its annual report on Form 10-K and its quarterly reports on Form 10-Q (copies of which can be viewed on the Company's website).-----------------------------------------------------All of the information in this presentation are forward-lookingstatements, as defined above. As such, there is uncertaintyassociated with if or when any of these features will be added to theproduct.
73
Data management changes in Jasper
Default page size changed to 4K
New catalog implementation
Catalog base tables have been renamed All catalog access by applications is through views Catalog base tables are reorganized, more efficient View dependencies on base tables and views are now tracked
Improved storage organization for BLOB columns
In-row BLOB prefix default is no longer fixed at 254: CHAR/VARCHAR: minimum 8, maximum 128 BINARY/VARBINARY: minimum 0, maximum 256 can override on per-column basis
New storage architecture for long values, permits efficient random access
74
View dependency tracking
Three states for any view:
Valid: compiled and active, can be utilized in queries Invalid: view has been invalidated by the server due to
dependency checking as a result of DDL on base tables Upon reference, the server will attempt to compile the view and use it
if possible Otherwise, query will get an error
Disabled: view has been explicitly disabled (via new statement, DISABLE VIEW), and is unusable View must be explicitly enabled in order to become valid (via new
statement, ENABLE VIEW)
75
View dependency tracking
Upon an ALTER (or DROP): Server attempts to acquire an exclusive lock on the object to be
modified Server honours the current setting of the BLOCKING option
Server then acquires exclusive locks on all dependent views If any lock cannot be acquired, the statement gets an error
Once locked, all dependent views are invalidated ALTER (or DROP) statement is executed With ALTER, the server attempts to revalidate all the previously
invalidated views Views successfully recompiled are marked as valid Otherwise, the view is left in the invalid state
Server will attempt to recompile it when First referenced in a server session, or When other DDL is performed that may affect that view
76
Internationalization improvements
Support for NCHAR data type
NCHAR strings are stored as UTF-8 NCHAR specification and functions use character semantics, not byte
semantics NCHAR(10) means 10 characters (1-4 bytes per character)
CHAR specification now supports either BYTE or CHAR modifier E.g. CHAR(10 BYTE) or CHAR(23 CHAR)
NCHAR can support either UCA (Unicode Collation Algorithm) using IBM’s ICU library
Properly supports multi-byte character sorting
A legacy collation stored as UTF-8 Database now can have two collations, one for NCHAR, one for CHAR
Details in session SQL506 Monday afternoon
77
Indexing changes
New index implementation Improved implementation of compressed B-tree indexes Key values are duplicated in the index to support index-only retrieval and
snapshot isolation Older “hash”-based indexes have been dropped entirely Index column order for primary keys now based on PK constraint
declaration, not column order in table PK can be altered, reordered without rewriting all the rows in the table Order specification can now be specified with any constraint index
e.g. PRIMARY KEY (X ASC, Y DESC, Z ASC) Foreign key column order can now be different than that of PK
All indexes now appear in the SYSINDEXES view Planned:
Ability to declare that a FK is unique (to enforce a 1:1 relationship) Abstract indexes into logical and physical implementations
Redundant indexes will not be created
78
Shareable global temporary tables
Shared global temporary tables
New syntax: CREATE GLOBAL TEMPORARY TABLE ….. SHARE BY ALL
The contents of the table will persist until explicitly deleted or until the database is shut down. On database startup, the table will be empty.
Row locking on shared temporary tables behaves the same as for permanent tables
Inserts, updates and deletes on shared temporary tables are not recorded in the transaction log
Column statistics are maintained in memory by the server.
79
Data management changes in Jasper
Last modification time for any row in a table now retained in SYSTABLE
Resolution is one second LOAD TABLE enhancements: better performance, ENCODING
option, ROW DELIMITED BY option
Apply multiple transaction logs at startup (can specify a directory)
Better row-level locking implementation
Elimination of key-range locking with anti-insert locks Planned: introduction of INTENT locks (e.g. FETCH FOR UPDATE)
Improved administration of large databases:
Parallel backup Auto-tuning to exploit multiple CPU’s on SMP hardware
Faster unload/reload, index creation, database validation
80
Database mirroring
Provides “hot” failover for a SQL Anywhere database Involves two or three separate servers: primary, mirror, arbiter Transaction log pages are passed from the primary server to the mirror to
keep the mirror up-to-date Mirror server is not accessible by any other connections
Effectively the mirror server is in continuous recovery mode Log pages can be passed in three modes:
Synchronously (default) on COMMIT Asynchronously on COMMIT – better performance than synchronous mode Asynchronously when log page is full, with a timeout option
Async implies the usual caveats with possible lost transactions
Role switch occurs if primary server fails Arbiter used to verify the mirror state before role switch proceeds Clients are disconnected from the primary server
Must reconnect to the mirror See Techwave session SQL508 – High Availability ASA on Wednesday
81
Snapshot isolation support
Provides read-consistency in the face of concurrent writes from other transactions (e.g. writers do not block readers)
Enabled by a global database option, allow_snapshot_isolation Three new transaction isolation levels:
“snapshot” – cleanest semantics, transaction sees a consistent view of the database as of transaction start (the time the first row was accessed)
“stmt-snapshot” – requires less resources, however each statement sees a consistent state of the database but at different times Only one snapshot time exists for a connection; outermost or first statement sets the
transaction time “read-only-stmt-snapshot” – like stmt-snapshot, but only for queries; update
statements execute at isolation level 1 Usage is not free
Old copies of rows are maintained in a “row version store” (part of the database’s temp file) for as long as necessary to ensure consistency for any transaction
Indexes have a mix of “old” and “current” values Can affect the performance of both sequential and index scans
82
Snapshot isolation support
Setting the isolation level: set transaction isolation level snapshot set transaction isolation level statement snapshot set transaction isolation level read only statement snapshot
Or within an ODBC application, use SA_SQL_TXN_SNAPSHOT SA_SQL_TXN_STATEMENT_SNAPSHOT SA_SQL_TXN_READ_ONLY_STATEMENT_SNAPSHOT
Update conflicts are still possible Isolation levels can be mixed (but not recommended) Database property VersionStorePages contains the number of pages in
the temp file devoted to copies of old rows BLOB values do not reside in the temp file, but remain in the main database
file and are reference counted Some restrictions on DDL when snapshot transactions are in progress
(ALTER TABLE, etc.)
83
Lazy CHECKPOINTs
A Jasper server can now initiate a CHECKPOINT and perform other operations while it takes place.
In previous releases, all database activity would stop while the CHECKPOINT took place.
There can only be one CHECKPOINT in progress at a time.
If a CHECKPOINT is already in progress, then any operation like an ALTER TABLE or CREATE INDEX that wants to initiate a new CHECKPOINT needs to wait for the last one to finish.
Lazy checkpoints are not used if using the –m option
Documented by START CHECKPOINT and FINISH CHECKPOINT records in the transaction log
84
Application profiling and request-level logging
Major enhancements in the Jasper release
Unified logging architecture Can log data to a database, rather than a flat file Can log data to a different database, even on another server
Much lower overhead
Considerably greater detail in diagnostic information Lock contention Statements within stored procedures and triggers Elapsed times Query plans
Planned improvements to DBCONSOLE for real-time server status
Attend sessions SQL501/514 Tuesday afternoon at 1:30 ASA Performance Analysis from Start to Finish
85
iAnywhere at TechWave 2005
Ask the iAnywhere Experts on the Technology Boardwalk (exhibit hall)• Drop in during exhibit hall hours and have all your questions answered by our technical
experts!• Appointments outside of exhibit hall hours are also available to speak one-on-one with our
Senior Engineers. Ask questions or get your yearly technical review – ask us for details!
TechWave ToGo Channel• TechWave ToGo, an AvantGo channel providing up-to-date information about TechWave
classes, events, maps and more –now available via your handheld device! • www.ianywhere.com/techwavetogo
iAnywhere Developer Community - A one-stop source for technical information!Access to newsgroups,new betas and code samples• Monthly technical newsletters• Technical whitepapers,tips and online product documentation• Current webcast,class,conference and seminar listings• Excellent resources for commonly asked questions• All available express bug fixes and patches • Network with thousands of industry experts
http://www.ianywhere.com/developer/
86
SQL Anywhere ‘Jasper’ Release
Learn more about 'Jasper', the upcoming SQL Anywhere release, loaded with features focused on:
• Enhanced data management including performance, data protection, and developer productivity
• Innovative data movement including manageability, flexibility and performance, and messaging
Attend the following sessions:SQL Anywhere 'Jasper' New Feature Overview Session SQL512 will be held Monday, August 22nd, 1:30pm
MobiLink 'Jasper' New Feature Overview Session SQL515 will be held Wednesday, August 24th, 1:30pm
... and remember to look for sneak peeks in other sessions and morning education courses!
Register for the Jasper Beta program: www.ianywhere.com/jasper
87
Questions
?