using sql databases from apl (dyalog & other) j. merrill analytical software corp....

98
Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. [email protected]

Upload: elinor-rodgers

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Using SQL Databases from APL (Dyalog & other)

J. Merrill

Analytical Software Corp.

[email protected]

Page 2: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Overview

• About my APL and SQL experience• This talk emphasizes information about SQL

– Concepts you must know– Weaknesses you need to understand– Ideas about SQL database design

• Why so much about SQL?– Could a few people learn the details about SQL, and

build tools to shield other developers from the issues?• Why you should not hide SQL from APLers

– The very successful project that was a failure• APL talking to SQL is not hard to do (any more)

Page 3: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

My Background

• I started working with APL professionally in 1975, as a user of STSC timesharing.

• While working at STSC, I took a course in relational database technology and SQL from a representative of a small company called Relational Software Inc. Later they changed their name to Oracle – I’ve worked with SQL for longer than Oracle Corporation has existed!

Page 4: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Work at STSC

• I automated connection of an Oracle database of microcomputer software sales data to the existing (APL-based) accounting system.

• I designed a portable interface between STSC’s APL*Plus and SQL on these platforms:– VMS talking to RDB– VMS talking to Oracle– Unix talking to Oracle

and implemented the APL in those interfaces.

Page 5: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

STSC, Manugistics, and Later

• I was a software development consultant for many years with both STSC and (after the name change) Manugistics.

• Client projects included migrating an APL-based database of foreign exchange trading data to Microsoft SQL Server.

• Since founding Analytical Software Corp. I’ve worked extensively with clients using SQL Server from APL+Win.

Page 6: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Concepts You Must Know

• SQL has only four data manipulation verbs

• RDBMSs use Client / Server techniques

• SQL has the concept of null

• All data is in a table (like an APL matrix)

• SQL columns are named, not numbered

• SQL columns are “strongly typed”

• Relationships are data not structure

Page 7: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

DML: select insert update delete

• SQL’s select is extraordinarily versatile• SQL’s insert and update normally operate

on a single row at a time and therefore are quite primitive

• If insert and update work on multiple rows, not a single row, they become very powerful (as powerful as select)

• SQL’s delete is usually simple but can use some powerful select techniques

Page 8: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Has DML and DDL

• DML means data manipulation language

• SQL’s four DML verbs are select, insert, update, delete

• For defining the database structure (schema), SQL has DDL

• DDL means data definition language

• DDL is less standardized between products

Page 9: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL select is Very Powerful

• select supports extremely complex statements that can access data from multiple tables, potentially joining them using every available join technique in a single query

• Much of select’s power is based on the ability to define derived tables within a statement, and use those tables as if they held real data – but no data is actually stored to run the query– The syntax is intended let you express intent without

specifying a particular processing sequence

• Powerful does not mean simple or intuitive

Page 10: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

insert & update: SQL’s Weak Spot

• When data is in memory (in an application)– insert statements allow you to add one row,

giving values for each column in the new row– update statements typically reference a single

row by its primary key (though it can change many columns in one update)

• Applying a group of changes based on user input typically involves issuing many one-row-each insert or update statements

Page 11: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Multi-Row insert and update

• insert supports adding the result of any select to an existing table

• update similarly can update many rows of one table from the contents of another table, providing the tables can be joined (usually on the primary key)– The syntax for this is not completely standard

and some products may not support this

• Get the data on the server if possible!

Page 12: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL delete Does What It Says

• delete can use the full power of select to determine what row or rows of a table should be deleted

• When deleting rows based on a list of primary key values, implementations that limit the length of a SQL statement (or an individual clause within the statement) can require multiple delete statements to delete a large number of rows

Page 13: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Concepts You Must Know

• SQL has only four data manipulation verbs

• RDBMSs use Client / Server techniques

• SQL has the concept of null

• All data is in a table (like an APL matrix)

• SQL columns are named, not numbered

• SQL columns are “strongly typed”

• Relationships are data not structure

Page 14: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Client / Server Concepts

• Clients send requests to the server and it responds with the results

• Only the DBMS server touches the data

• It’s good to have the server do things like compute totals so that results are smaller

• The number of back-and-forth interactions is potentially more significant than the amount of data being transferred

Page 15: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL is the Language for Server Requests and Responses

• Your program sends SQL statements (only) to the server and handles the responses

• Some tools hide the fact that everything is done with SQL, by generating SQL for you

• SQL is very powerful for reading data (select) and deleting data (delete)

• SQL is quite stupidly simple for writing data in memory into a table (insert, update)

• The over-the-wire protocol is complex, but is hidden by (most) client libraries

Page 16: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

One Program Touches the Data

• A single coordinating program, the DBMS server, performs all access (reading and writing) to data files holding the SQL data

• Access to data is controlled by the server

• The server process can be run on a very powerful (expensive) computer

• Conflicting requests from different users can be handled more easily

Page 17: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Let the Server Do the Work

• The server should be a better computer than the one running your program

• It is optimal if the server can get the job done without sending a lot of data over the wire to your program

• Downside: the more you get the server to do the work (by sending SQL), the more you have to worry that things will need to change if you switch to another DBMS

Page 18: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Avoid “Chatty” Programs

• Each interaction with the server takes time for the back-and-forth protocol, in addition to whatever actual data is passed

• Rather than sending a series of steps to the server one after the other, it’s better to send one larger request

• SQL’s simplistic insert & update is an issue• Downside: the more you “bunch up” your

requests, the more difficult it is to handle errors

Page 19: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Concepts You Must Know

• SQL has only four data manipulation verbs

• RDBMSs use Client / Server techniques

• SQL has the concept of null

• All data is in a table (like an APL matrix)

• SQL columns are named, not numbered

• SQL columns are “strongly typed”

• Relationships are data not structure

Page 20: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL’s Concept Called null

• Intended to represent a “missing” or “unknown” value, rather than an “empty” value

• If a value is null, it is not equal to any particular value, nor is it not-equal to any particular value. Suppose A is an arbitrary SQL expression:

int_column = 5

char_column = ‘hello’

One would think that the result ofA or (not A)

has to be true but in SQL that’s not the case (!).

Page 21: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

null Causes “three valued logic”

• APLers think we understand booleans, but then along comes SQL’s notion of null that changes the rules we’ve learned

• When nulls are involved, the result of a boolean expression is one of true, false, or null

• The result of any expression is null if any value involved in the expression is null

• The SQL function coalesce returns the first not-null parameter value passed: coalesce(column, value_if_null)

Page 22: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Use Null Values Sparingly

• Allowing nulls sometimes makes sense– Definitely makes sense for date or time– Makes sense for numeric data when 0 is not

an appropriate “no information” replacement• Don’t use a “missing value code” in the database,

even if you use one in your APL data

– Any distinction between NULL and ‘’ (an empty string) is hard to see, justify, or explain to non-programmers (or programmers!).

• Sometimes it’s not your decision

Page 23: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Concepts You Must Know

• SQL has only four data manipulation verbs

• RDBMSs use Client / Server techniques

• SQL has the concept of null

• All data is in a table (like an APL matrix)

• SQL columns are named, not numbered

• SQL columns are “strongly typed”

• Relationships are data not structure

Page 24: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

All SQL Data is in Tables

• There is no other persistent data structure

• SQL table values are conceptual scalars– SQL can hold strings (enclosed char vectors)– Otherwise, no nested data (in standard SQL)

• If APL had scalar strings, a SQL table would be just like a simple APL matrix

• SQL locates rows using data values only– No notion like row number in an APL matrix

Page 25: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Columns

• Columns are named, not numbered

• Each column has a (scalar) data type

• Most data types can support the special SQL value null– Handling null values correctly can be difficult

Page 26: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Column Names

• One difference between products is in what column names they allow

• Some allow spaces in names, but that makes it necessary to “quote” those column names (so SQL can parse statements using them)

• Standard quoting techniques vary (!)• Advice: use simple, not-very-long names• It’s not always your choice

Page 27: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Concepts You Must Know

• SQL has only four data manipulation verbs

• RDBMSs use Client / Server techniques

• SQL has the concept of null

• All data is in a table (like an APL matrix)

• SQL columns are named, not numbered

• SQL columns are “strongly typed”

• Relationships are data not structure

Page 28: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Column Type Catgeories

• Numbers

• Strings

• Dates and Times

• Binary

• Time Intervals

• Boolean

Page 29: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Numbers

• Integers of various sizes (1, 2, 4, 8)– Usually no “unsigned integer”

• 4- or 8-byte floating point

• Fixed width and number of decimals

• Money sometimes available– 64-bit integer with 4 implied decimals

Page 30: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Strings

• Fixed length with maximum size

• Variable length with maximum size

• “Unlimited size” often available

• Multi-byte data (Unicode) often available

Page 31: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Dates and Times

• One or more of date, time, datetime• Concept of NULL is very useful here• Dates are stored as Julian date values

– Many support negative Julian dates

• Time values are stored as fraction of a day– 0.25 is 6am, 0.5 is noon, 0.75 is 6pm

• Datetime values stored as (day# + timefraction)• If only datetime is available,

– time can are represented as being “on day 0”– date can be represented as being “at midnight”

Page 32: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Time Intervals

• Supposedly standard

• Not universally available (SQL Server)

• Can be represented with a date or datetime value holding the result of subtracting one date or datetime value from another

Page 33: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Binary

• Fixed length with a maximum size

• Variable length with a maximum size

• “Unlimited size” often available

Page 34: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Boolean

• A single bit (!)– Not “fixed or variable length with a maximum

size” like string and binary data

• Sometimes boolean columns support null (so there are 3 states)

• My advice: use a one-wide character column instead

Page 35: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Time for a Break ?

Page 36: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Concepts You Must Know

• SQL has only four data manipulation verbs

• RDBMSs use Client / Server techniques

• SQL has the concept of null

• All data is in a table (like an APL matrix)

• SQL columns are named, not numbered

• SQL columns are “strongly typed”

• Relationships are data not structure

Page 37: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

How Relationships Work in SQL

• In APL, you can choose to represent related structures by nesting

• In SQL, there are only tables and there is no possibility of nesting

• Relationships are stored by having a column in one table store a value identifying a related row in another table (or null if there is no related row)

Page 38: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Table Design

• All columns in a table should represent information about one entity (or concept)– row = data about one instance– column = same data re many instances

• The term “field” is deprecated (but widely used)

• Each table needs a “primary key” column whose value can be stored in any other table that has a relationship with rows in this table

• SQL-generated primary key values allow relationships to be represented easily and can prevent issues that arise when users think they want to change key values

Page 39: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Relationships

• One to many (parent - child)

• Many to one (lookup)

• Hierarchy

• Many to many

Page 40: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Relationships – one to many

• Each (one) invoice can have any number (many) of line items– LineItem table has an InvoiceID column

holding Invoice table primary key (PK) value – “No line items for invoice N” is easily

represented by there being no rows in the LineItem table with InvoiceID = N

• An APL application might use nesting, storing a vector or matrix of lineitem data within each Invoice data structure

Page 41: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

One to Many = Parent - Child

• The “one” table (Invoice) is parent

• The “many” table (LineItem) is child

• Column holding primary key (PK) of another table is called a foreign key (FK) to (or into) that table

• Foreign keys are critical to competent SQL database design

Page 42: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Relationships – many to one• There can be any number (many) of employees

in each (one) department– Employee table has a column holding Department

table PK value (read as “FK to Department”)– “No employees in department N” is again easy

• Not thought of as parent / child; you don’t think of departments as “owning” employees the way invoices “own” line items

• An APL application could (but wouldn’t) use nesting, storing a vector of EmployeeID values (but not Employee data) for each department

Page 43: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Relationships – hierarchy

• Each employee has a manager; the manager is also an employee

• Employee table has a ManagerID column holding FK to the same (Employee) table

• Top of hierarchy is represented by null ManagerID value (no manager)

• APL could use nesting, but I haven’t

Page 44: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Relationships – many to many

• Requires a third table to hold two many-to-one relationships to the other two tables

• ClassStudent table records that– each class can have any number of students

enrolled in it– each student enrolls in any number of classes

• You can’t use nesting for this (without double-storing the information), so APL applications usually do it the same way

Page 45: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

More SQL Design Thoughts

• Advice: use your product’s “identity” or “sequence” feature to create PK values

• When in doubt, define another table

• Examples of extra tables that add flexibility

• Storing historical data (not just the current value, but changes over time)

Page 46: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Advice: Use DBMS-assigned Values as Primary Keys

• Most implementations support having the DBMS assign the value of a particular column to be 1+ the previous value, during insert of a new row

• Columns defined this way make excellent PKs that are efficient and easily used as FKs

• Columns that are user-visible “primary keys” (e.g. Product Code, Region Code, Department Code) become attributes– Changing them no longer causes pain and trouble

Page 47: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

When In (Any) Doubt, Define Another Table

• When you see a numeric suffix on a column name, that’s an indication that another table should have been built

• FK columns in an Operation table named Doctor1 Doctor2 Doctor3 Doctor4 would be a poor design– What if more than 4 doctors involved?– Better to have a child table with any number

(even zero) of doctors for an operation

Page 48: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

More Tables Are Often Better

• Some cases where using more tables results in a more flexible system:– Address table where both Customer and

Employee tables have AddressID column• Address table has column AddressType (FK to

AddressType table with descriptions Billing, Shipping, Home, Office…)

– Name table where Patient table has NameID• Column Primary in Name table marks the current

(primary) name to use for this person

Page 49: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Historical Data

• Data that changes over time where history needs to be maintained can be in a child table with a ValidUntil column, and possibly also a ValidSince column– The currently active child table row is marked by a

null ValidUntil value– Find the active child table row on date X:

(X >= coalesce( ValidSince, X )) ^(X <= coalesce( ValidUntil, X ))

– Code needs to ensure that timespans don’t overlap– Can store ID of active child table row in parent

Page 50: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Joins

• Because data values are used to define relationships, SQL needs to be able to combine (join) tables based on the defined FK-to-table relations – and it can do so quite flexibly

• Each SQL statement must re-specify the join condition(s); tools to generate SQL code can make this easier to get right

• SQL supports different kinds of joins:– Natural join– Left (or right) join– Cross join

Page 51: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Time for Lunch ?

Page 52: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Join is What in APL?• Similar to: • Natural join

– values of that would cause in APL instead remove those rows from A (!)

– if there are multiple rows that match in B, rows of A are repeated to match

• Left join: adds a row of null values to B to avoid losing rows of A with unmatched values:

• Cross join: like APL outer product (usually this

is a mistake) – all combinations of values

Page 53: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Relationships vs. Join Types

• Parent-child relationship = left join

• Lookup table = left join

• Natural join removes unmatched rows – make sure that’s what you want

• Many-to-many should normally be two left joins from the middle table, because it’s two combined one-to-many relationships

• Hierarchies are non-trivial to handle

Page 54: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Given the Issues, Why Use SQL?

• Standard outside the APL world

• Superb support for transactions

• Security is built-in

• Advanced database features would be very difficult to provide otherwise

Page 55: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Databases are Standard

• Wide availability of expertise

• Backup and recovery are solved problems

• Other parts of an application can be developed by others or using other tools

• Many non-APL tools can work with them– Logical and physical data modeling– Reporting– Performance analysis

Page 56: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Database Transactions

• ACID characteristics– Atomicity, Consistency, Isolation, Durability

• All-or-nothing data updates, ensuring data consistency even when there are errors during updates

• Each user’s work is not affected by the work of other users

• Server or application software crashes leave data intact after “a restart

Page 57: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Security Built In

• DBMS server controls all access to data– application can be the only way to data; or– can use Windows identity as SQL identity

• Access can be removed from the underlying tables, perhaps granting select access via views that use user identity

• Corporations are accustomed to securing their databases, but not their (APL) files

Page 58: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Advanced DBMS Features

• Flexible high-performance indexing• Views (including “materialized” views)• Sub-queries• Stored procedures• Triggers• XML data handling• Partitioned tables• Materialized views

Page 59: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Database Indexes

• Auxiliary data structures to speed access to data, like the index of a book

• Can include multiple columns

• A “unique index” prevents duplicates

• A “clustered index” changes the physical arrangement of rows– Clustering a child table by the parent key

places all child rows adjacent on disk

Page 60: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Database Views

• Almost any SQL select statement can be saved as a view

• Views act like virtual tables– There is no data in a view, only in the table or

tables referenced by the view– Views can be targeted by select statements

exactly as if they were physical tables– In some limited cases, views can be updated– Users can be granted access to a view rather than

to the table(s) named in the view

Page 61: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQL Subqueries

• SQL statements can define derived tables that exist only for that select– Like a view that exists for one statement

• Who has not purchased product X?select … from Customer where CustomerID not in

(select distinct inv.CustomerID

from Invoice as inv join LineItem as li

on inv.InvoiceID = li.InvoiceID

where li.Product = X)

Page 62: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Stored Procedures and Triggers

• Programs that run within the database– Stored Procedures (SPs) can be called from

client programs, or by other SPs– Triggers are a special type of SP that fire (run)

on any of insert, update, delete– Can be used to validate or audit

• Usually written a product-specific proprietary language– Writing in Java or .Net is becoming possible

Page 63: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Support for XML Data

• Products are vying for the best support

• XML data can of course be stored as variable-length character data, but that doesn’t require any special support

• Examples of XML support– Load data or create virtual tables from XML– Use XPath syntax to locate data within the

XML stored in a column, and use that data to select rows

Page 64: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Partitioned Tables

• Dividing a logical table, one defined by a union view, into separate physical tables

• The purpose is to divide the workload of accessing (or updating) the table between multiple servers

• Only appropriate for huge tables, or ones that become a bottleneck in a transaction processing system

Page 65: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Materialized Views

• Some views are better off being computed once and stored, rather than being virtual (and thus re-computed each time they’re accessed)

• If SQL can know how to update the view when the underlying data changes, this can speed up access to the view at the cost of storing the materialized view and making needed updates

• The views that are eligible to be materialized are fairly limited in most products today

Page 66: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

A Quick Look at Implementation

• Usually multiple tables in each physical file– flexible assignment of tables to files

• Each table’s data is a group of pages

• A page of data contains multiple rows

• Pages of a table are linked together

• Index data may be in file with table data

• Indexes reference rows by page and position within the page

Page 67: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Which database to choose?

• It’s often not your choice• If you can choose, what matters most to you?

– cost– reliability– availability of knowledgeable people– performance

• If performance is critical, test lesser-known products; some are amazingly fast for simpler usage scenarios yet quite robust and stable

Page 68: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Can You Choose?

• In corporate development, the data likely already exists in a particular database

• Only if you are building a new system (or a new product) are you given the opportunity to select the database platform

• Many customers care only that it’s “one of the big ones” (Oracle, SQL Server, DB2)

• To customers, the formula is often that the unknown is risky, and risk is bad

Page 69: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

What’s Most Important?

• Licensing costs vary dramatically– Pure open source is “free” for a reason

• no support unless you pay

– The “big three” cost an arm and a leg• but your company may already license it

– Lesser-known products can be a great value

• If it’s mission-critical, you want “big 3”– but it’s still your problem (and will be said to

be your fault) if the database fails

Page 70: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

What’s Most Important?

• Sometimes you’ll need to find expertise– Oracle is notorious for needing a guru– If any part of the application is built outside

APL, you’ll find people more easily if you go with one of the big boys

• Performance is rarely known to be critical– Starting out, you may have no idea if the

DBMS’s performance will be a problem– Except in extreme cases, you can handle it

Page 71: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Lesser-known Products

• I’m somewhat biased towards them– Sometimes support is unbelievably great– They’re swimming against the tide – help

show the world that they deserve to exist

• Using APL is already “weird” and “risky”– If your test results impress you as much as

they’ve impressed me, why not use what you think is the best tool, despite the “risk” of an unknown tool?

Page 72: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

APL-based or APL-aware?

• If you find a system that seems to fit well, and is either based on APL or has some special support for APL callers, why would you reject it automatically?

• But you might not really be “using a SQL RDBMS” in all such those cases, and it depends on why you’re leaning towards SQL in the first place

Page 73: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Time for a Break ?

Page 74: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

APL Can Speak to SQL Easily

• SQAPL for Dyalog APL (uses ODBC)– Also available for Dyalog Unix (not free, nor are the

best ODBC drivers for Unix)• DataDirect apparently has the best Unix ODBC drivers; they

are recommended by Dyalog (others work as well)

– Same API for APL+Win is called APL+Link

• ADO.Net – Windows database API #6• ADO (original), or OLE DB, via COM/OCX• Write directly to some API• Use a home-grown DLL, OCX, or AuxProc

Page 75: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQAPL Wins

• It’s very hard to prove that statement in every environment, with every workload

• Why it’s better– It’s the only interface that understands how APL

works and inherently supports its arrays– Other interfaces require major work to hide one-

row-at-a-time programming models

• Version 5 has a new mechanism that can provide a significant performance improvement (10x better has been reported) with a relatively small change

Page 76: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Other Choices Are Inferior

• Not APL oriented• Very “loopy” with “scalar thinking” APIs• Only recently have APIs (in .Net) reached parity

with SQAPL techniques from 5+ years ago!• Why bang your head against the wall when

SQAPL comes free with Dyalog APL?• SQAPL is a proven, tested, robust solution

– Put in something about the stupidity of trying to re-invent a tough, well-worn, smooth, very round wheel

Page 77: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQAPL Sets

• APL developers think of changes to data happening all at once, when an array value is stored (e.g. in a file component)

• Storing an APL array as SQL data frequently involves multiple SQL operations, because each SQL operation changes only one row at a time

• SQAPL Sets are an abstraction to make working with SQL data more APL-like

Page 78: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Basics of Using Sets

• If you plan to modify data, start by reading data using one of the “set” tools; the result is a “set handle” and a data matrix

• Modify the matrix by adding and/or deleting rows, and/or updating values

• Call “set update” to store the data, passing the set handle and the new matrix

Page 79: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

How Do Sets Work?

• The retrieved data is stored, associated with the “set handle” that is returned to you

• When you call “set update”, APL is used to compare the old and new “tables” and make the changes by generating the needed SQL statements and running them

• Sets are an extraordinarily simple but powerful idea; take advantage of them

Page 80: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

SQAPL’s Current “Sets” Implementation is Not Complete

• Should use SQL transactions– So the new version of the in-memory array will

be successfully stored in the database, or the database will be unchanged

• Could support DBMS-assigned “identity” values – As I’ve said, I think these are very useful– Returning the identity values back to APL is

very important (to avoid the need to re-query)

Page 81: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

More Possible Enhancements

• Fix a few bugs– SQL handles are not closed ASAP– Occasional inconsistent error handling (some

now fixed)

• Could easily support multi-column keys (but I prefer to use identity values as keys)

• Could support optimistic concurrency• Could support automatic updates to

related (child) tables

Page 82: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Database-Specific Tools

• SQAPL provides the tools to talk to various SQL databases. (Remember that they only understand SQL statements.)

• A desirable tool would be one to improve the performance of Set Update by passing data to be updated in fewer steps, perhaps as XML.

• With SQL Server in particular, sending XML data for insert and update can let them operate in bulk, rather than one row at a time. Impressive performance gains would be quite likely.

Page 83: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Improving SQAPL’s API

Working with an APL+Win client, I did major design and implementation work on an object-oriented layer built on the SQAPL code. This included making most of the enhancements discussed above, adding major features for calling stored procedures, creating and modifying tables, and much more. I hope to have a similar opportunity in Dyalog APL.

Page 84: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Performance Issues

• Most of these are SQL / DBMS issues, not issues of the APL interface to SQL– LAN vs WAN makes a huge difference

(latency)– Consider using a “middle tier” in WAN case

• The nature of these issues varies wildly from one DBMS to another– The same SQL statement may be very fast in

one implementation, very slow in another

Page 85: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Figuring Out What’s Wrong

• APL is the wrong tool for analyzing performance problems– APL can’t see what the DBMS is doing– APL can’t see the system-level slowdowns– Non-APL tools exist for no other purpose

• If the problem is “too much back and forth”, can you find a way to “blast” the data across in fewer steps?

Page 86: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Metadata Based Tools

• Most APL applications are based on, or at least use, metadata – data about the data being manipulated. APLers use these techniques naturally. It surprises me how infrequently other programmers in other languages seem to do so.

• Tools that know your database structure can be used for many things. For example, if the table hierarchy is available as metadata, a routine can be written to produce the SQL where clause to join an arbitrary list of tables.

Page 87: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Other Tools

• Knowing the datatype of database table columns can simplify creating UI screens.

• Knowing the relationship between tables can let (tools called by) model code retrieve related data automatically.

• Automating changes to the database schema given metadata is possible – I know, I’ve done it. (It’s not trivial!)

Page 88: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Lessons Learned

• Always (except that one should never say always) use identity values as PKs– Unless you can prove that you have one of

the (very rare, in my opinion) cases where there’s a real reason not to

– GUID PKs are an alternative to identity values, but …

• they are comparatively “fat” vs. integers• the problems with identity values are overstated• the “any location” benefit is overstated

Page 89: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

More Lessons

• Always write APL that writes SQL– Build tools to help you do so

• Don’t hide SQL from APL developers

• Table names should be singular (my opinion); not all people agree

• Try not to base your design on the ability of your (current) DBMS to handle it well

Page 90: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Unproven (by me) Ideas

• These are thoughts that haven’t been fully implemented (at least not by me in APL), so be aware of that before basing your development on them.

• They represents some potential “best practices” that borrow significantly from work done by others, mostly in other languages.

Page 91: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Model Definition is Often Hazy

• The line between UI, data model, and database can be unclear.

• Data manipulated by the UI of a form is most easily stored in attributes of the form.

• Many programs have a “UI data model” that’s separate from the model of the data as stored persistently (whether or not it’s stored in a SQL database).

Page 92: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Problems Caused by UI Models

• Changes must be coordinated– When model data is stored in the UI, making changes

to either the UI or the data definition forces them to change at the same time.

• UI validation code works from the UI model– Code for event-handling in the UI naturally does BR

(business rule) validation using UI-stored data.

• Validation logic gets repeated– The model must enforce BRs to prevent bad data

from being stored, so logic gets repeated.

Page 93: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Use Model Code in the UI

• When a model’s data is simple – and the power of APL to manipulate data structures makes that common when writing in APL – it can be tempting to do the manipulation directly.

• User interface code should avoid that, instead invoking methods of the model even though they may be trivial.

• Consider a web app – server-side code can’t rely on data being correct when submitted, so it must do validation.

• Code for browser-side validation either repeats the logic or consults the server. Ideally, the latter is done asynchronously (using an Ajax-style model).

Page 94: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

UI / Data Model Separation

• It is clear that the user interface code is best kept separate from (a client of) data models.

• This results in models that are easier to test, and easier to re-use in other contexts.

• When the UI needs to implement a Save action, it can be tempting to use your app’s data storage APIs directly in the code that gathers the data from where it’s stored in the UI.

• Better would be to call a model’s Save method.

Page 95: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

UI Forms Reference a Model

• UI forms should have a reference to a model object.

• The model should be able to hold data about multiple data instances (database table rows) – this is APL, after all.

• The model can have documented events to which the UI can subscribe.

• UI data should be passed to the model as soon as possible.

Page 96: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Unified Memory / Database Model

• The model should know about the state of the data it holds relative to the database.– Is it in the database (vs. new data)?

• If it is, has it changed since then?

– Is it currently valid (ready to be saved)?• If not, what error message data is there?

• One good source of design ideas is Scott Ambler’s ActiveRecord model. It is only slightly array-oriented, so it is not ideal unchanged.

Page 97: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

Writing Models

Write model methods on an as-needed basis. APL programmers often work to create tools that have much more functionality than is needed today. When writing models, put required functionality into lower-level tools (creating them as needed) and keep only high-level logic, and calls to the lower-level tools, in the models. Maximize the work not done.

Page 98: Using SQL Databases from APL (Dyalog & other) J. Merrill Analytical Software Corp. jamesmerrill@usa.net

A Big Thank You to Morten

I would not be here if Morten had not asked me to deliver this talk. I had not seen Dyalog APL in more than 20 years, having used only APL*PLUS (now APL+) for all that time. I appreciate seeing where Dyalog APL is today, and I’m always happy to visit Denmark. I’ve enjoyed what’s felt like a warm welcome from this community. I thank both you and Morten for your generosity.