introduction to structured query language and databases

61
Introduction to Structured Query Language and Databases for Researchers Patrick Bills, Research Consultant Institute for Cyber-Enabled Research, MSU

Upload: others

Post on 17-Jan-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Structured Query Language and Databases

Introduction to Structured Query Language

and Databases for ResearchersPatrick Bills, Research Consultant

Institute for Cyber-Enabled Research, MSU

Page 2: Introduction to Structured Query Language and Databases

What is this Class?

3-hour workshop introducing Structured Query Language ( SQL)

Understand different ways that SQL is used. Light on theory, heavy on practice

SQL as synonym for relational data model

method for efficient data entry

The SQL language for analytics

software to interact with Relational databases in files or servers

Example SQL from the MSU Mara Hyena Project, Dr. Kay Holekamp

2

Page 3: Introduction to Structured Query Language and Databases

Compare SQL to SpreadsheetsSpreadsheets

• are free-form. Data can be in a table but can be anywhere, multiple tables per sheet.

• Each ‘cell’ of spreadsheet can contain anything

• Rows can get mixed up : sorting one column and not the others

• Calculations (Functions) on cells are stored in a new cell

SQL Databases

• only separate tables; rows always stay together. A row is also called a “record”

• items in rows (fields) should have just one type of data.

• Calculations done with SQL, and always result in a table

• SQL is stored in “views”

3

Page 4: Introduction to Structured Query Language and Databases

How we will work

Computer TEXT - indicates SQL code I expect you type

Don’t type command preceded by # or - - ( these are code comments)

The sticky notes are provided to help us help you. When doing exercises, start by taking your sticky off, then place a green or red sticky note on top of your laptop:

No Sticky I am working

Green I am done and ready to move on (yea!)

Red I am stuck and need some help (or more time)

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

4

Page 5: Introduction to Structured Query Language and Databases

How we will work: software

There are many ways to use SQL. Work with desktop software today

Make sure you have SQLiteStudio Installed

Free, but funky. A little tricky to work with, but best free database program on Windows, Mac, and Linux

There are many brands of SQL database formats: this uses SQLite

You may have heard of working with a database server, but we can also work with a database file

5

Page 6: Introduction to Structured Query Language and Databases

Part 1. Using SQL on a single table

6

Page 7: Introduction to Structured Query Language and Databases

Example SQLite file:Ecology Lab Portal DB

1. Download SQLiteFile from http://git.io/vcL6Z (also available on https://github.com/billspat/portaldb/blob/master/sql_class_one_table.sqlite3?raw=true

2. Confirm that the file was downloaded as sql_class_one_table.sqlite3

3. Move to a your documents or desktop if you like

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

7

Page 8: Introduction to Structured Query Language and Databases

Open File and examine

Start SqliteStudio

Open “Database” menu and select “Add database”

click the file folder and find sql_class_one_table.sqlite3

note: the Green (+) is to create a new database; we are opening one

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

8

Page 9: Introduction to Structured Query Language and Databases

How to View Tables with SQLiteStudio

select a table on the left, click ‘triangles’ to display properties double column click for details ignore Indexes and Triggers for now right side of screen also shows column details

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

9

Tables and Views

Page 10: Introduction to Structured Query Language and Databases

How to View Tables with SQLiteStudio

click the [Data] tab on top right that to display the data viewscroll horizontal and vertical to see more data click column headings to sort the data grid 1000 rows retrieved at a time for efficiency;data has >34,000 rows

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

10

Page 11: Introduction to Structured Query Language and Databases

Explore the data

• date in 3 columns month, day, year, all integers

• plot code as a numeric ID code

• species as a code, scientific name, and taxa (group)

• measurements: numeric or blank (null)

11

To prepare to use SQL to work with data from the db, we have to know what is in it.

Page 12: Introduction to Structured Query Language and Databases

About this database

12

Long-term monitoring and experimental manipulation of a rodent community in the Chihuahuan Desert near Portal, Arizona.

• JH Brown, University of New Mexico, TJ. Valone, Saint Louis University, SK. Morgan Ernest,Utah State University

• 24 experimental plots were established in 1977 in the Chihuahuan desert ecosystem near Portal, Arizona with various experimental manipulations over the years

• Monitoring of the composition and abundances of ants, plants, and rodents has occurred continuously on all 24 plots.

• From 1977–2002, individual-level data on rodents (i.e., species, sex, size, reproductive condition) was collected monthly for each plot.

• From 1980–2002 recorded precipitation at the study site.

Page 13: Introduction to Structured Query Language and Databases

Goal: Summarize ecological data for Onychomys app.

How much data do we have? When and where was it collected? How often was it observed in the 24 plots? What were the average body measurements? Bonus points: can we reuse our work for other species?

Dr. Ashlee Rowe studies carnivorous grasshopper mice, Onychomys torridus, tolerance to scorpion venom. Can we cite these Portal, AZ data for body measurements?

Page 14: Introduction to Structured Query Language and Databases

SQL Commands with SQLiteStudioBefore we can use SQL to answer these questions, we need to learn where to put SQL with the SQLiteStudio program:

• From the “Tools” menu, select “open SQL Editor”

• In the query window type SQL code.

• Press the Blue Triangle in upper left to ‘run’ the SQL.

• Data appear in Grid below

• Status or Error messages below that14

Page 15: Introduction to Structured Query Language and Databases

SQL Commands with SQLiteStudioFrom the “Tools” menu, select “open SQL Editor”

In the query window type SQL code.

Press the Blue Triangle in upper left to ‘run’ the SQL.

Data appear in Grid below

1. Type SQL Here

3. Results

2. Press This:

4. status or error

Page 16: Introduction to Structured Query Language and Databases

Selecting Data with SQL

Goal : use SQL to show all rows for some columns

Enter the following SQL in the SQL Editor Window:

select year, plot from plot_surveys;

Note: SQLiteStudio shows 1000 rows per page. Use the pagination buttons to see more

Show more columns:

select year, plot, genus, species_id from plot_surveys;

16

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 17: Introduction to Structured Query Language and Databases

Selecting Data with SQLBasic select statement template is

SELECT column1, column2, … FROM table;

SQL is not case sensitive; upper case by convention

SQL can have line breaks and white space

Column or table names may need to be quoted if they contain spaces or special characters. (Double quotes only for SQLite)

SELECT “favorite thing”, fav_idFROM “a few of my!”;

Our database uses simple column and table names17

Page 18: Introduction to Structured Query Language and Databases

Selecting Data with SQL

Goal : format output of SQL for better exporting

rename columns with the “as” keyword

select Year, plot as "Plot Number" from plot_surveys;

You can include ALL columns with the special character *

select * from plot_surveys

18

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 19: Introduction to Structured Query Language and Databases

Calculating with ArithmeticGoal : convert weight from grams to ounces

SQL supports arithmetic with + - * /

select year, plot, species_id, weight*0.035274 as weight_oz from plot_surveys;

Bonus: convert to kilograms instead

select year, plot, species_id, weight/1000 as weight_kg from plot_surveys;

19

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 20: Introduction to Structured Query Language and Databases

Calculating with FunctionsGoal : remove extra decimals from conversion.

Round off converted value with the round() function.

select year, plot, species_id, round(weight*0.035274,2) as weight_oz from plot_surveys; FROM surveys;

functions take the form of function(argument1, argument2, etc)

The rounding function form is round(expression, <number of decimals>)

and can be combined by nesting : function1(function2(column))

All other SQL vendors support math functions e.g. log(), cos(), but SQLite DOES NOT to keep the code small.

20

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 21: Introduction to Structured Query Language and Databases

Combining Characters with || operator

Goal : put both Genus and Species in one column of output.

Character fields can be combined with the “||” operator. “A” || “B” => “AB”; add a space with “ “

select Year, plot as "Plot Number", genus || ' ' || species as "Species" from plot_surveys;

Bonus: how to combine date columns to 1977-3-1? Year|| '-' ||Month||'-'|| Day as “Survey Date"

21

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 22: Introduction to Structured Query Language and Databases

SQL Filtering with “WHERE”Add a “WHERE…” clause in SQL statement add conditionals

SELECT column1, column2, etc FROM table WHERE conditions;

Try this: list only rows for one genus:

select year, month, plot, genus, species, weight from plot_surveys where genus = ‘Onychomys';

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 23: Introduction to Structured Query Language and Databases

SQL Filtering with “WHERE”WHERE clause is : Numeric comparisonsCharacter comparisons (using single quotes)combination of logical conditions, using AND and OR

Try this: find “heavy” grasshopper mice (more than 40 g):

select year, month, plot, genus, species, weight from plot_surveys where genus = 'Onychomys' and weight >= 40;

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 24: Introduction to Structured Query Language and Databases

Re-using SQL: saving SQL files

We’d to re-use the SQL we are writing. There are several ways to do that.

save a SQL file. Plain text file with one or moe SQL commands

In SQLiteStudio, right click to select “save” or “open”

SQL files can have multiple SQL statements

Demo: download and open SQL file

Page 25: Introduction to Structured Query Language and Databases

Database Views : Reusable SQL

Goal: we want to use keep some SQL in our database so that :

• we can keep it together with the tables in one file;

• it can be shared in the database; and

• it can be used like another table (for more SQL)

• shortcut for often used WHERE conditions,

• select * from plot_surveys where genus = "Onychomys"

Page 26: Introduction to Structured Query Language and Databases

database ‘Views’ in practiceGoal: save our species filter for re-use, a query shortcut

1. Develop SELECT statement

SELECT * FROM plot_surveys WHERE genus = 'Onychomys';

2. Prefix statement with “create…” to make a view

CREATE VIEW os_surveys AS SELECT * FROM plot_surveys WHERE genus = 'Onychomys';

3. Test. Double click the view to open, like a table

4. Use. Incorporate this view just as you would a table

SELECT * FROM os_surveys WHERE year=1989

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 27: Introduction to Structured Query Language and Databases

More ‘Views’ in practiceGoal: format the date nicely for this view

Dates take complex functions to work with. Let’s save our date in format that SQLIite likes with the printf() function.

1. Develop SELECT statement that combines date columns into new column, along with remaining columns.

SELECT printf('%4d-%02d-%02d', year, month, day) as “survey_date”,* FROM plot_surveys WHERE genus = 'Onychomys';

2. Prefix statement with “create…” to make a view

CREATE VIEW os_surveys AS SELECT printf('%4d-%02d-%02d', year, month, day) as “survey_date”,* FROM plot_surveys WHERE genus = 'Onychomys';

3. ERROR! Why? We can’t create a view that already exists. Need to “drop” the view first

DROP VIEW os_surveys IF EXISTS;

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 28: Introduction to Structured Query Language and Databases

Empty data: NULLS

Goal : filter out empty data

SQL databases have a special value for fields when no data is present : NULL

You can find or ignore those with the “is null” or “is not null” where conditions.

Page 29: Introduction to Structured Query Language and Databases

Empty data — NULL values

Weights of rodents were not collected for all years, so we need to filter out rows with no weights.

Find where weight is not null select year, month, genus, species, weight from plot_surveys where weight is not null;

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 30: Introduction to Structured Query Language and Databases

Summarize with Aggregate Functions

How many rows data do we have for the species we are interested in?

We often want to summarize several rows: sum, average, count, etc.

SQL has “Aggregate functions” for basic calculations per group.

Goal: Count how many animals of our species were caught per yearUse count() and group by year

select year, count(row_id) from plot_surveys where genus = "Onychomys" group by year;

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 31: Introduction to Structured Query Language and Databases

Summarize with Aggregate Functions

Goal: take the average weight of Grasshopper mice caught per year

Use the avg() aggregate function - must also use “group by” clause for each year

select year, avg(weight) from plot_surveys where genus = "Onychomys" group by year

But what about when the weight is not taken (is null?)

select year, avg(weight) from plot_surveys where genus = “Onychomys” and weight is not null group by year

Same Answer because avg() excludes nulls automatically (in SQLite)

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 32: Introduction to Structured Query Language and Databases

Aggregate functions in practiceGoal: refine our results by adding a count of samples (n) and better formatting

Count() the rows with weights, round the average

select year, round(avg(weight),2) as weight_avg, count(record_id) as n from plot_surveys where genus = "Onychomys" and weight is not null group by year

Unlike avg(), count() includes nulls, so need to exclude them here

Bonus: create new column with year and month, group by both, and plot the time series of avg weight by month

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 33: Introduction to Structured Query Language and Databases

Average weights for Grasshopper mice

Bonus: add the month field, and plot the time series

select year || "-" || month as month, round(avg(weight),2) as annual_weight_avg from plot_surveys where genus = "Onychomys" group by year, month

1978-7 1979-9 1980-10 1981-11 1982-11 1983-12 1985-3 1986-5 1987-5 1988-6 1989-6 1990-7 1991-7 1992-8 1993-9 1994-10 1995-12 1996-12 1998-2 1999-3 2000-6 2001-8

38

17

20

25

30

35

Month

avg

wei

ght

g

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 34: Introduction to Structured Query Language and Databases

Ordering tables

Database tables are not stored with any particular order

sorted as needed, have an order based on key columns (more later).

Order results in queries using the “ORDER BY” and the “GROUP BY” if present SELECT <columns>FROM <table> WHERE <condition1 and/or condition2 ….>GROUP BY <columns>ORDER BY <columns>

Page 35: Introduction to Structured Query Language and Databases

ORDER BY in practiceWhat is the largest and smallest weights of Onychomys spp? Examine range of weights of Onychomys using by sorting from lightest to heaviest then the other way select weight, plot, year, month,genus, species from plot_surveys where genus = 'Onychomys' order by weight;

Sorting is ‘ascending’ by default, but DESC command will reverse that select weight, plot, year, month,genus, species from plot_surveys where genus = 'Onychomys' order by weight DESC;

weight plot year month genus species

56 4 1982 7 Onychomys leucogaster

56 14 1985 7 Onychomys leucogaster

51 5 1981 3 Onychomys leucogaster

49 3 1983 7 Onychomys leucogaster

Bonus: Use aggregate functions max() and min() to find that instead…

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Result in descending order

Page 36: Introduction to Structured Query Language and Databases

order by vs. max() or min()

What is the largest and smallest weights of Onychomys spp?

Use aggregate functions max() and min() to find that

select genus, species, max(weight), min(weight) from plot_surveys where genus = 'Onychomys' group by genus, species;

genus species max min

Onychomys leucogaster 56 10Onychomys sp. 24 18Onychomys torridus 46 5

Result doesn’t tell WHICH rows have the max/min, just the values

Page 37: Introduction to Structured Query Language and Databases

Data TypesAll columns must have single “data type” which is one of fIve or more categories

INTEGER. The value is a signed integer, maximally 2**63 to (2**63-1) ( 9,223,372,036,854,775,807)

• You may see other types in other databases such as SMALLINT or BIGINT which make storing more efficient.

REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.

• FLOAT, SINGLE (precision, 4 byte storage), DOUBLE (precisions, 8 byte)

TEXT. The value is a text string, stored using the database encoding

• CHAR(N) = reserves n characters of storage, VARCHAR(n) variable storage sizes

• Let's not talk about encoding, that will take another hour

BLOB. Binary Large Object : not a number or text, a sound, a picture, stored exactly as it was input

DATE/TIME: most DBs have these formats

Page 38: Introduction to Structured Query Language and Databases

Handling Date/Time

Database vendors differ on how to store temporal values, but most have specific date and time types

SQLite does not have a specific format for dates/times for simplicity

SQLite can store dates and times as characters, among other things.

Best practice

• store dates as Year (4 digit) - Month (2 digit) - day(2 digit). “2014-03-23”

• store times as Hours::Minutes:Seconds “15:03:04

Page 39: Introduction to Structured Query Language and Databases

Working with DatesGoal : convert 3 date columns to single date for date calculations and comparisons.

SQLite uses special date functions. See https://www.sqlite.org/lang_datefunc.html This can be cumbersome but is most flexible.

To build a date:

select date(printf(‘%4d-%02d-%02d', year, month, day)) as “survey_date”, plot, species_id, from plot_surveys;

39

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 40: Introduction to Structured Query Language and Databases

Multi-table databases

Page 41: Introduction to Structured Query Language and Databases

How to add group characteristic?

This data has info on all animals that were trapped. Sometimes Birds or Lizards were trapped and recorded

Goal: we want to view all rodent data, by excluding all non-rodents

Solution: create a table of species and add this field

Page 42: Introduction to Structured Query Language and Databases

Species information in plot_surveys table

Goal: list only species information in this table

select species_id, genus, species from plot_surveys order by genus, species;

Results has species are repeated for each time trapped

Goal: show each species just one timeSolution: use the “distinct” keyword for unique data

select distinct species_id, genus, species from plot_surveys order by genus, species;

Note we could use ‘group by’ but that requires an aggregate function

Bonus: save this a view with create view species_present as

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 43: Introduction to Structured Query Language and Databases

Create a new table and fill that table with species

Goal : be able to add columns to species for refined queries

Create table

create table species ( species_id text, genus text, species text, taxa text);

Add data. Can insert data from a select

insert into "species" (species_id, genus, species) select distinct species_id, genus, species from plot_surveys;

The next step is to enter all the taxa values for each species

We are going to cheat and skip to a database that is complete…

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 44: Introduction to Structured Query Language and Databases

Download Final version of Portal Survey DB

In the interest of time, we’ll download a database with species as a separate table

1. Please download : http://git.io/vcYSQfile sql_class_final.sqlite3

2. Open with SQLite Studio In “Database” menu, select “connect to database”Browse to this final db above

3. Confirm there are 3 tables in this database

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 45: Introduction to Structured Query Language and Databases

R in RMDB is for Relational

Goal : keep information simple and less redundant

E.g. Species is stored here as species_id, genus, species and tax.

Using multiple related tables is where SQL really shines.

We only need one of these values; the others are redundant

We can create a “species” table to hold this information, and “link” the two tables together.

Create a new species table

45

Page 46: Introduction to Structured Query Language and Databases

Tables are linked on common “key”

species.species_id

surveys.species_id

Page 47: Introduction to Structured Query Language and Databases

Full Database ModelHow to we ask questions about groups of food?

47

Page 48: Introduction to Structured Query Language and Databases

Summarize species using joins

Display data for grasshopper mice using a join with species table

Open SQL Editor window again and enter :

select weight, year, month, genus, species, taxa from surveys inner join species on surveys.species_id = species.species_id where species.genus = 'Onychomys';'

48

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 49: Introduction to Structured Query Language and Databases

Summarize species using joins

Find the heaviest rodents Display maximum weights for all rodent species

USING ALL TECHNIQUES YOU’VE LEARNED SO FAR select genus, species, max(weight) as maxweight from surveys inner join species on surveys.species_id= species.species_id where species.taxa='Rodent' group by genus, species order by maxweight DESC;

49

Institute for Cyber-Enabled Research

How we will work

We are going to learn basics using lecture and hands-on exercises.

Exercises are denoted by the following icon in this presentation:

We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal

Don’t type command preceded by # (comments)

The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:

• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)

Page 50: Introduction to Structured Query Language and Databases

CardinalityThere is one record for each species, but each species occurs many times in surveys.

Species->Surveys is one-to-many

Species were trapped many times in many traps

species -> traps is many-to-many

Page 51: Introduction to Structured Query Language and Databases

Optional. Expanding the data

Goal: determine if there is a correlation between abundance (captures) and precipitation.

We need to prepare the data for use with an analysis

Rainfall data is available but needs to be imported into the database.

1. Read the meta-data and download the CSV File

2. Examine the contents to determine table we need

3. Create a new table to hold the precipitation data

4. Import the data into this new table

Page 52: Introduction to Structured Query Language and Databases

Optional MaterialIntroduction to SQL

Page 53: Introduction to Structured Query Language and Databases

Optional: Adding Rainfall dataSearching the metadata, we find there is associated precipitation data. Can we incorporate that?

Variable name Variable definition Storage type

Variable codes, definitions, and notes

Year Year data collected Integer 1980–1989

Month Month data collected Integer 1–12

Precipitation Precipitation amount in rain guage

Double Measurement unit: millimeters

-99 = missing datatable adapted from http://esapubs.org/archive/ecol/E090/118/Portal_precipitation_metadata.htm

Download text file from github (TO DO: address)

Page 54: Introduction to Structured Query Language and Databases

Optional: adding rainfall dataCreate table

create table rainfall (year integer, month integer, precipitation double);

Download data from

http://esapubs.org/archive/ecol/E090/118/Portal_precipitation_19801989.csv

Import using SQLiteStudio Tools menu, import….

Variable name Variable definition Storage type

Variable codes, definitions, and notes

Year Year data collected Integer 1980–1989

Month Month data collected Integer 1–12

Precipitation Precipitation amount in rain guage

Double Measurement unit: millimeters

-99 = missing data

Page 55: Introduction to Structured Query Language and Databases

Epilogue: About SQL Databases

55

Page 56: Introduction to Structured Query Language and Databases

About our software

SQLite : C-language libraries to work with single- file database with full SQL and blazing speed. Provides command-line interface only. Maximum database size 140 Terabytes, ~ 10^12 rows of datahttps://www.sqlite.org/

SQLiteStudio: Desktop Application for working with SQLite format file, includes the SQLite library

• http://sqlitestudio.pl • Fullest featured of free, cross-platform SQLite Desktop Programs • Free, but a little funky • Manual install from Zip or DMG required (see instructions)

Many other programs and connectors for using SQLite format files from Statistical, Office, Scripting languages, web, etc.

56

Page 57: Introduction to Structured Query Language and Databases

About SQLite

SQLite is a “database engine” to which you send SQL commands to work on a database.

The program to send commands to the engine is called the client program. Ways to send SQL commands to the SQLite engine:

• Program like SQLiteStudio

• Script or other program

• SQLite’s stand-alone command line interface (CLI) (comes with a download of SQLite)

Most SQLite Tutorials and book use this command line CLI for teaching. OS X comes with it, but requires an install on Windows and Linux

Quick Demo

Page 58: Introduction to Structured Query Language and Databases

SQL is Declarative

The core of SQL is a declarative language.

• you state what you want; let the language processor to figure out how to deliver the desired results.

• restaurant menu: You ask for what you want and you don't know how it's prepared

• Less conceptual burden, becuase no implementation burden

• Fixed command structure SELECT stuff FROM something WHERE something like 'radical'

• Focus on data, not process = quicker coding for results

Programming languages are imperative (C, Java, Python, etc)

• each step must be explicit • analogy is recipe, instructions for preparation

• Process oriented, flexible

• Flow control - branching based on previous condition ( possible in declarity but verbose)

• More coding and conceptual overhead

58

Page 59: Introduction to Structured Query Language and Databases

Program + SQL = EfficientSQL is great way to get data out of a system, so many programs for science and the web use SQL inside of scripts

Pseudo code:

if(today.weekday?) SQL="select stuff from work_to_dos;” else // weekend! SQL=“select stuff from home_to_dos;" endif daily_do_list = Db.getresults(sql); print(daily_do_list)

Yes this could be expressed in sql, if to-do items were stored in single table and categorized, and then used 'case' satement in where clause based on date.

59

Page 60: Introduction to Structured Query Language and Databases

How to use SQL in a script

Goal: I’d like to use SQLite in shell script for workflow

Once SQLite is installed can use at the command line/terminal as follows. Most books on SQLite use this technique.

sqlite3 sql_class_final.sqlite3 < “select * from surveys where year=1977”

Page 61: Introduction to Structured Query Language and Databases

Resources

http://sqlite.org for documentation and details

Nice book on SQLite with great detail:

https://www.safaribooksonline.com/library/view/using-sqlite/9781449394592/