databases 101 - caltech astronomygeorge/aybi199/graham_db1.pdf · what is a database? astructured...

36
Databases 101 Matthew J. Graham CACR Methods of Computational Science Caltech, 3 May 2011 matew graham

Upload: truongtram

Post on 24-May-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Databases101

MatthewJ.GrahamCACR

MethodsofComputationalScienceCaltech, 3May2011

matthew graham

fliptrap

matthew graham

fliptrap

matthew graham

fliptrap

matthew graham

fliptrap

matthew graham

whatisadatabase?

A structuredcollectionofdataresidingonacomputersystemthatcanbeeasilyaccessed, managedandupdated

Dataisorganisedaccordingtoadatabasemodel

A DatabaseManagementSystem(DBMS) isasoftwarepackagedesignedtostoreandmanagedatabases

matthew graham

whyuseadbms?

dataindependence

efficientandconcurrentaccess

dataintegrity, securityandsafety

uniformdataadministration

reducedapplicationdevelopmenttime

dataanalysistools

matthew graham

scaleofdatabases

"DBs own the sweet spot of1GB to 100TB" (Gray&Hey,2006)

SQLite

MySQL,PostgreSQL

SQLServer, Oracle

∗Hive, HadoopDB

matthew graham

datamodels

A collectionofconceptsdescribinghowstructureddataisrepresentedandaccessed

Withinadatamodel, the schema isasetofdescriptionsofaparticularcollectionofdata

Theschemaisstoredina datadictionary andcanberepresentedinSQL,XML,RDF,etc.

Insemanticsadatamodelisequivalenttoanontology-"aformal, explicitspecificationofasharedconceptualisation"

matthew graham

fliptrap

matthew graham

flat(file)model

Datafilesthatcontainrecordswithnostructuralrelationships

Additionalinformationisrequiredtointerpretthesefilessuchasthefileformatproperties

Hollerith1889patent"ArtofCompilingStatistics"describeshoweveryUS residentcanberepresentedbyastringof80charactersandnumbers

Examples: delimited-separateddata, HTML table

matthew graham

relationalmodel

Dataisorganizedasrelations, attributesanddomains

A relationisatablewithcolumns(attributes)androws(tuples)

Thedomainisthesetofvaluesthattheattributesareallowedtotake

Withintherelation, eachrowisunique, thecolumnorderisimmaterialandeachrowcontainsasinglevalueforeachofitsattributes

ProposedbyE.F.Coddin1969/70

matthew graham

transactions

Anatomicsequenceofactions(read/write)inthedatabase

Eachtransactionhastobeexecuted completely andmustleavethedatabaseinaconsistentstate

Ifthetransactionfailsorabortsmidway, thedatabaseis"rolledback"toitsinitialconsistentstate

Example:AuthorisePaypaltopay $100formyeBaypurchase:-Debitmyaccount $100-Credittheseller'saccount $100

matthew graham

acid

Bydefinition, adatabasetransactionmustbe:

Atomic: allornothing

Consistent: nointegrityconstraintsviolated

Isolated: doesnot interferewithanyothertransaction

Durable: committed transaction effectspersist

matthew graham

concurrency

DBMS ensuresthatinterleavedtransactionscomingfromdifferentclientsdonotcauseinconsistenciesinthedata

Itconvertstheconcurrenttransactionsetintoanewsetthatcanbeexecutedsequentially

Beforereading/writinganobject, eachtransactionwaitsforalock ontheobject

Eachtransactionreleasesallitslockswhenfinished

matthew graham

locks

DMBS cansetandholdmultiplelockssimultaneouslyondifferentlevelsofthephysicaldatastructure

Granularity: atarowlevel, page(abasicdatablock), extent(multiplearrayofpages)orevenanentiretable

Exclusivevs. shared

Optimisticvs. pessimistic

matthew graham

logs

Ensuresatomicityoftransactions

Recoveringafteracrash, effectsofpartiallyexecutedtransactionsareundoneusingthelog

Logrecord:

-- Header(transactionID,timestamp, ...)-- ItemID-- Type-- Oldandnewvalue

matthew graham

partitions

Horizontal: differentrowsindifferenttables

Vertical: differentcolumnsindifferenttables(normalisation)

Range: rowswherevaluesinaparticularcolumnareinsideacertainrange

List: rowswherevaluesinaparticularcolumnmatchalistofvalues

Hash: rowswhereahashfunctionreturnsaparticularvalue

matthew graham

structuredquerylanguage

Appearedin1974fromIBM

Firststandardpublishedin1986; mostrecentin2008

SQL92istakentobedefaultstandard

Differentflavours:

Microsoft/Sybase Transact-SQLMySQL MySQLOracle PL/SQLPostgreSQL PL/pgSQL

matthew graham

create

CREATE DATABASE databaseNameCREATE TABLE tableName (name1type1, name2type2, . . . )

CREATE TABLE star (name varchar(20), ra float, dec float, vmag float)

Datatypes:• boolean, bit, tinyint, smallint, int, bigint;• real/float, double, decimal;• char, varchar, text, binary, blob, longblob;• date, time, datetime, timestamp

CREATE TABLE star (name varchar(20) not null, ra float default 0, ...)

matthew graham

keysCREATE TABLE star (name varchar(20), ra float, dec float, vmag float,

CONSTRAINT PRIMARY KEY (name))

A primarykeyisauniqueidentifierforarowandisautomaticallynotnull

CREATE TABLE star (name varchar(20), ..., stellarType varchar(8),CONSTRAINT stellarType_fk FOREIGN KEY (stellarType)REFERENCES stellarTypes(id))

A foreignkeyisareferentialconstraintbetweentwotablesidentifyingacolumninonetablethatreferstoacolumninanothertable.

matthew graham

insertINSERT INTO tableName VALUES(val1, val2, . . . )

INSERT INTO star VALUES('Sirius', 101.287, -16.716, -1.47)

INSERT INTO star(name, vmag) VALUES('Canopus', -0.72)

INSERT INTO starSELECT ...

matthew graham

delete

DELETE FROM tableName WHERE conditionTRUNCATE TABLE tableNameDROP TABLE tableName

DELETE FROM star WHERE name = 'Canopus'

DELETE FROM star WHERE name LIKE 'C_n%'

DELETE FROM star WHERE vmag > 0 OR dec < 0

DELETE FROM star WHERE vmag BETWEEN 0 and 5

matthew graham

update

UPDATE tableName SET columnName =val1WHERE condition

UPDATE star SET vmag = vmag + 0.5

UPDATE star SET vmag = -1.47 WHERE name LIKE 'Sirius'

matthew graham

select

SELECT selectionList FROM tableList WHERE conditionORDER BY criteria

SELECT name, constellation FROM star WHERE dec > 0ORDER by vmag

SELECT * FROM star WHERE ra BETWEEN 0 AND 90

SELECT DISTINCT constellation FROM star

SELECT name FROM star LIMIT 5ORDER BY vmag

matthew graham

joins

Innerjoin: combiningrelatedrows

SELECT * FROM star s INNER JOIN stellarTypes t ON s.stellarType = t.id

SELECT * FROM star s, stellarTypes t WHERE s.stellarType = t.id

Outerjoin: eachrowdoesnotneedamatchingrow

SELECT * from star s LEFT OUTER JOIN stellarTypes t ON s.stellarType = t.id

SELECT * from star s RIGHT OUTER JOIN stellarTypes t ON s.stellarType = t.id

SELECT * from star s FULL OUTER JOIN stellarTypes t ON s.stellarType = t.id

matthew graham

aggregatefunctions

COUNT,AVG,MIN,MAX,SUM

SELECT COUNT(*) FROM star

SELECT AVG(vmag) FROM star

SELECT stellarType, MIN(vmag), MAX(vmag) FROM starGROUP BY stellarType

SELECT stellarType, AVG(vmag), COUNT(id) FROM starGROUP BY stellarTypeHAVING vmag > 14

matthew graham

views

CREATE VIEW viewName AS . . .

CREATE VIEW region1View ASSELECT * FROM star WHERE ra BETWEEN 150 AND 170

AND dec BETWEEN -10 AND 10

SELECT id FROM region1View WHERE vmag < 10

CREATE VIEW region2View ASSELECT * FROM star s, stellarTypes t WHERE s.stellarType = t.id

AND ra BETWEEN 150 AND 170 AND dec BETWEEN -10 AND 10

SELECT id FROM regionView2 WHERE vmag < 10 and stellarType LIKE 'A%'

matthew graham

indexes

CREATE INDEX indexName ON tableName(columns)

CREATE INDEX vmagIndex ON star(vmag)

A clusteredindexisoneinwhichtheorderingofdataentriesisthesameastheorderingofdatarecords

Onlyoneclusteredindexpertablebutmultipleunclusteredindexes

TypicallyimplementedasB+treesbutalternatetypessuchasbitmapindexforhighfrequencyrepeateddata

matthew graham

cursorsDECLARE cursorName CURSOR FOR SELECT ...OPEN cursorNameFETCH cursorName INTO ...CLOSE cursorName

A cursorisacontrolstructureforsuccessivetraversalofrecordsinaresultset

Slowestwayofaccessingdata

matthew graham

cursorsexample

Foreachrowintheresultset, updatetherelevantstellarmodel

DECLARE @name varchar(20)DECLARE @mag floatDECLARE starCursor CURSOR FOR

SELECT name, AVG(vmag) FROM starGROUP BY stellarType

OPEN starCursorFETCH starCursor INTO @name, @magEXEC updateStellarModel @name, @mag / CALL updateStellarModel(@name, @mag)

CLOSE starCursor

matthew graham

triggersCREATE TRIGGER triggerName ONtableName ...

A trigger is procedural code that isautomatically executed in response tocertaineventsonaparticulartable:• INSERT• UPDATE• DELETE

CREATE TRIGGER starTrigger ON star FOR UPDATE ASIF @@ROWCOUNT = 0 RETURNIF UPDATE (vmag) EXEC refreshModels

GO

matthew graham

storedprocedures

CREATE PROCEDURE procedureName @param1type, . . .AS ...

CREATE PROCEDURE findNearestNeighbour @starName varchar(20) ASBEGIN

DECLARE @ra, @dec floatDECLARE @name varchar(20)SELECT @ra = ra, @dec = dec FROM star WHERE name LIKE @starNameSELECT name FROM getNearestNeighbour(@ra, @dec)

END

EXEC findNearestNeighbour 'Sirius'

matthew graham

normalisation

Firstnormalform: norepeatingelementsorgroupsofelementstablehasauniquekey(andnonullablecolumns)

Secondnormalform: nocolumnsdependentononlypartofthekey

StarName Constellation Area

Thirdnormalform: nocolumnsdependentonothernon-keycolumns

StarName Magnitude Flux

matthew graham

programming

Java

import java.sql.*...String dbURL = "jdbc:mysql://127.0.0.1:1234/test";Connection conn = DriverManager.getConnection(dbUrl, "mjg", "mjg");Statement stmt = conn.createStatement();ResultSet res = stmt.executeQuery("SELECT * FROM star");...conn.close();

matthew graham

programming

Python:

import MySQLdbCon = MySQLdb.connect(host="127.0.0.1", port=1234, user="mjg",

passwd="mjg", db="test")Cursor = Con.cursor()sql = "SELECT * FROM star"Cursor.execute(sql)Results = Cursor.fetchall()...Con.close()

matthew graham