bases de dades: introduction to sql (part 1)bagdanov/database/lectures/lecture-02.pdf · bases de...

26
Last week on bases de dades... Goals of today’s lecture Data and data-centricity Iterated design Exercises Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov [email protected] Departamento de Ciencias de la Computación Universidad Autónoma de Barcelona Fall, 2010 Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Upload: others

Post on 14-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Bases de Dades: introduction to SQL (part 1)

Andrew D. [email protected]

Departamento de Ciencias de la ComputaciónUniversidad Autónoma de Barcelona

Fall, 2010

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 2: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Outline

1 Last week on bases de dades...

2 Goals of today’s lecture

3 Data and data-centricity

4 Iterated design

5 Exercises

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 3: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Embedded databasesThe NOSQL movement

Embedded databases

DB archaeology: Repeat my “locate *.sqlite” experiment on your(or a university) computer. Try to find more examples ofembedded databases in applications. Other things you mightsearch for are “*.sql” or “*.db”.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 4: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Embedded databasesThe NOSQL movement

The nosql movement

Changing times: I mentioned “nosql” databases a few times, butnever said what that means. Do some searching to discover whatthe principal ideas are behind the nosql movement.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 5: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Goals and decisionsOutline

Goals for today

Begin thinking about database design and how it affects databaseapplications.Introduce the high-level language SQL (Structured QueryLanguage).Familiarize ourselves with some of the main types of SQLstatements.Understand the importance of data modeling.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 6: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Goals and decisionsOutline

Design or use?

We have a problem of the chicken and the egg...It’s hard to teach about database use before we know how adatabase is designed.But it’s hard to design a database before we know somethingabout how it will be used.In this class we will follow an iterated design philosophy.We will design a little (learning as we go), use a DB a little(learning as we go), and then re-design a little to fix any problemswe encounter.Pretty similar to the real world (but be careful...)

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 7: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Goals and decisionsOutline

Standards, standards everywhere!

Someone once commented: The great thing about standards isthat there are so many to choose from!This is very true in the DBMS world, especially with SQL.There have been many revisions and major releases of the ANSISQL standard.SQL:1999 is the most broadly supported, SQL:2008 is the latestversion.No implementation is complete (there will be some missingfeatures).Every implementation implements custom extensions.Read the manual.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 8: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Goals and decisionsOutline

Outline

Data and data centricityWhen data is central.Why use a database?

A simple case studyBut nonetheless a real oneChoosing a data model

Expressing the design in SQLAnswering questionsExercises

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 9: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Why use a database?The unholy triad

Data-centric applications

Many applications are inherently data-centric: the generate andconsume large quantities of data.Email clients, search engines, reservation systems, productcatalogs, you name it: it’s all about the data.How should this data be stored in a way that is efficient?How should this data be stored so that it can be retrieved?How should this data be stored so that it can be maintained?How should this data be stored so that it can be flexibly retrievedand maintained in unforeseen ways?

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 10: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Why use a database?The unholy triad

Efficiency, standard API

DBMS provide answers to all of those questions.They are one of the most researched, advanced, sophisticatedand reliable technologies in the world of computer science.Unless you have a very good reason for doing otherwise, a DBMSshould be the backbone of any data-centric application.

Andy’s tenth axiom: Any sufficiently complex data-centric applicationcontains an ad hoc, informally-specified, bug-ridden, slowimplementation of half a relational database management system. 1

1With apologies to Greenspun and his tenth rule of programmingAndrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 11: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Why use a database?The unholy triad

Design decisions

Life in the database world is often a zero-sum game:

Compromise and cooperation is necessary.Remember that you may have to wear any of these hats at anytime...

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 12: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

A country profile database

Imagine an application (or group of applications) that mustroutinely deal with information about countries and the languagesspoken in them.It could be a GIS application, or a shipping address database, orjust about anything.What are the data modeling needs of such applications?What information should we model about countries?Assume for now that we are interested mostly in geopolitical andlinguistic information.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 13: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

Thought experiments

What types of queries will applications need to perform against thistype of database?

What language is spoken in a particular (group of) countries?In what countries is a particular language spoken?In how many countries is a language spoken?What countries are part of a geopolitical region?...

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 14: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

A first design: mono-tabular

Let’s assume we want to model the following information about acountry:

1 Country name2 Official ISO country code3 The continent on which it is4 The geopolitical region it belongs to5 Language spoken

Hmmm... Already we run into problems.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 15: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

Second run: mono-tabular

There are few countries in the world where only ONE language isspoken:

1 Country name2 Official ISO country code3 The continent on which it is4 The geopolitical region it belongs to5 Language spoken6 Is language official?7 Percentage of population speaking language

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 16: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

The data definition

Data (relations, actually) are represented in as tables in DBMS. Tablecolumns represent the attributes, rows are instances of data. Adescription of the structure all tables is called a database schema.

Our table description (each row corresponds to a column in DB):

Column Name Type Null? DefaultCode char(3) no ’ ’Name char(52) no ’ ’Continent enum no ’Asia’Region char(26) no ’ ’Language char(30) no ’ ’IsOfficial enum no ’F’Percentage float(4,1) no 0.0

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 17: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

Some sqlite meta-commands

This is a small sample, the .help command is very useful:

Command Action.read <file> execute sequence of SQL from file.schema <table> describe table structure.dump <table> dump SQL representing DB.quit quit sqlite3 CLI.help get help on meta commands

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 18: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

Our schema in SQL

CREATE TABLE Country (Code char(3) NOT NULL default ’’,Name char(52) NOT NULL default ’’,Continent enum NOT NULL default ’Asia’,Region char(26) NOT NULL default ’’,Language char(30) NOT NULL default ’’,IsOfficial enum NOT NULL default ’F’,Percentage float NOT NULL default 0.0

);

This SQL statement will create the ’Country’ table.Table structure expressed in a Data Definition Language (DDL).The table is created in the current database.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 19: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

Inserting from data

Each row in a table must be INSERT-ed.A row corresponds to a datum, or to a single element in therelation (using the set-theoretic formulation of relations).

INSERT INTO CountryVALUES(’AFG’, ’Afghanistan’,

’Southern and Central Asia’,’Asia’,’Balochi’, ’F’, 0.9);

...

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 20: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

SQL syntax diagrams

SQL has a lot of syntax. Too much, some might say...The sqlite reference manual uses has nice syntax diagrams tohelp (http://www.sqlite.org/lang.html):

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 21: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

A sample session

The SQL file used in this example is on the course website 2.

09:09:20> sqlite3 country.sqliteSQLite version 3.6.22Enter ".help" for instructionsEnter SQL statements terminated with a ";"sqlite> .read mono_country.sqlsqlite>

2http://www.cvc.uab.es/~bagdanov/database/mono_country.zip

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 22: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

SELECT-ing data

The main way to retrieve data (rows) is through the SELECTstatement.Your SQL reference will “fall open” to the select page.General form: SELECT <> FROM <> WHERE <>;

Each of the <> can be very, very complex.

Simple examples:

SELECT * FROM Country; /* ALL columns, ALL rows */

/* In which countries is Spanish spoken? */SELECT Code,Name FROM Country WHERE Language=’Spanish’;

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 23: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

More complex queries

/* In which countries is Spanish an official language? */SELECT Code,Name FROM Country WHERE Language=’Spanish’

AND IsOfficial=’T’;

/* We can COUNT things too. */SELECT count(*) FROM Country WHERE Language=’Spanish’;

/* On what continents is Spanish spoken? */SELECT Continent FROM Country where Language=’Spanish’;

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 24: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

A real-world problemExpressing in SQLComing to grips with SQL syntaxA complete DB

Key points

Things to take home:Data-centric applicationsAndy’s Tenth AxiomBasic sqlite interaction (meta-commands versus SQL statements).The CREATE TABLE, INSERT and SELECT SQL statements(basic versions).SQL syntax diagrams.

Next Week:Data Definition Language: specifying structure and constraints ontables.Refining the design of our case study: normalization primary keys.More complex queries.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 25: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Exercises: lecture 2

A few exercises to do at home. Please come to the next problem sessionprepared to discuss your findings (items indicated in BOLD will be collectedfor grading):

1 DB creation: Download the ’mono_country.zip’ file from the coursewebsite. Duplicate my experiments with creating the ’Country’ table andeach of the sample queries I showed in the course. 3

2 Redundancy: there is a LOT of redundancy in our first design of thedatabase (this is called an “unnormalized database”). In particular, all ofthe country information is repeated for each language in the country.What problems might this cause for the application programmer and theDB administrator? How might you fix this problem of redundancy?

3http://www.cvc.uab.es/~bagdanov/database/mono_country.zip

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)

Page 26: Bases de Dades: introduction to SQL (part 1)bagdanov/database/lectures/lecture-02.pdf · Bases de Dades: introduction to SQL (part 1) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento

Last week on bases de dades...Goals of today’s lectureData and data-centricity

Iterated designExercises

Exercises (TO BE COLLECTED 19 October)

3 Do it yourself: design (just design, do not implement) a data structureimplementing the information about countries used in this lecture. Whatoperations must supported on to implement all of the queries weexamined today? What are the advantages and disadvantages ofimplementing your own data structures versus using a DBMS.

4 Distinct attributes: Write queries to determine:The number of distinct languages spoken.The number of distinct regions in which Spanish is spoken.The countries where Spanish is NOT an official language, but isspoken by more than 50 of the population.

5 Inserting new rows: if you search for the countries where Catalan isspoken, you will note that there are some missing entries (France andItaly, at least). Write INSERT statements to insert these missing entriesinto the DB. Show the new results of search for Catalan-speakingcountries.

Andrew D. Bagdanov [email protected] Bases de Dades: introduction to SQL (part 1)