introduction to structured query language and databases
TRANSCRIPT
Introduction to Structured Query Language
and Databases for ResearchersPatrick Bills, Research Consultant
Institute for Cyber-Enabled Research, MSU
What is this Class?
3-hour workshop introducing Structured Query Language ( SQL)
Understand different ways that SQL is used. Light on theory, heavy on practice
SQL as synonym for relational data model
method for efficient data entry
The SQL language for analytics
software to interact with Relational databases in files or servers
Example SQL from the MSU Mara Hyena Project, Dr. Kay Holekamp
2
Compare SQL to SpreadsheetsSpreadsheets
• are free-form. Data can be in a table but can be anywhere, multiple tables per sheet.
• Each ‘cell’ of spreadsheet can contain anything
• Rows can get mixed up : sorting one column and not the others
• Calculations (Functions) on cells are stored in a new cell
SQL Databases
• only separate tables; rows always stay together. A row is also called a “record”
• items in rows (fields) should have just one type of data.
• Calculations done with SQL, and always result in a table
• SQL is stored in “views”
3
How we will work
Computer TEXT - indicates SQL code I expect you type
Don’t type command preceded by # or - - ( these are code comments)
The sticky notes are provided to help us help you. When doing exercises, start by taking your sticky off, then place a green or red sticky note on top of your laptop:
No Sticky I am working
Green I am done and ready to move on (yea!)
Red I am stuck and need some help (or more time)
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
4
How we will work: software
There are many ways to use SQL. Work with desktop software today
Make sure you have SQLiteStudio Installed
Free, but funky. A little tricky to work with, but best free database program on Windows, Mac, and Linux
There are many brands of SQL database formats: this uses SQLite
You may have heard of working with a database server, but we can also work with a database file
5
Part 1. Using SQL on a single table
6
Example SQLite file:Ecology Lab Portal DB
1. Download SQLiteFile from http://git.io/vcL6Z (also available on https://github.com/billspat/portaldb/blob/master/sql_class_one_table.sqlite3?raw=true
2. Confirm that the file was downloaded as sql_class_one_table.sqlite3
3. Move to a your documents or desktop if you like
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
7
Open File and examine
Start SqliteStudio
Open “Database” menu and select “Add database”
click the file folder and find sql_class_one_table.sqlite3
note: the Green (+) is to create a new database; we are opening one
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
8
How to View Tables with SQLiteStudio
select a table on the left, click ‘triangles’ to display properties double column click for details ignore Indexes and Triggers for now right side of screen also shows column details
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
9
Tables and Views
How to View Tables with SQLiteStudio
click the [Data] tab on top right that to display the data viewscroll horizontal and vertical to see more data click column headings to sort the data grid 1000 rows retrieved at a time for efficiency;data has >34,000 rows
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
10
Explore the data
• date in 3 columns month, day, year, all integers
• plot code as a numeric ID code
• species as a code, scientific name, and taxa (group)
• measurements: numeric or blank (null)
11
To prepare to use SQL to work with data from the db, we have to know what is in it.
About this database
12
Long-term monitoring and experimental manipulation of a rodent community in the Chihuahuan Desert near Portal, Arizona.
• JH Brown, University of New Mexico, TJ. Valone, Saint Louis University, SK. Morgan Ernest,Utah State University
• 24 experimental plots were established in 1977 in the Chihuahuan desert ecosystem near Portal, Arizona with various experimental manipulations over the years
• Monitoring of the composition and abundances of ants, plants, and rodents has occurred continuously on all 24 plots.
• From 1977–2002, individual-level data on rodents (i.e., species, sex, size, reproductive condition) was collected monthly for each plot.
• From 1980–2002 recorded precipitation at the study site.
Goal: Summarize ecological data for Onychomys app.
How much data do we have? When and where was it collected? How often was it observed in the 24 plots? What were the average body measurements? Bonus points: can we reuse our work for other species?
Dr. Ashlee Rowe studies carnivorous grasshopper mice, Onychomys torridus, tolerance to scorpion venom. Can we cite these Portal, AZ data for body measurements?
SQL Commands with SQLiteStudioBefore we can use SQL to answer these questions, we need to learn where to put SQL with the SQLiteStudio program:
• From the “Tools” menu, select “open SQL Editor”
• In the query window type SQL code.
• Press the Blue Triangle in upper left to ‘run’ the SQL.
• Data appear in Grid below
• Status or Error messages below that14
SQL Commands with SQLiteStudioFrom the “Tools” menu, select “open SQL Editor”
In the query window type SQL code.
Press the Blue Triangle in upper left to ‘run’ the SQL.
Data appear in Grid below
1. Type SQL Here
3. Results
2. Press This:
4. status or error
Selecting Data with SQL
Goal : use SQL to show all rows for some columns
Enter the following SQL in the SQL Editor Window:
select year, plot from plot_surveys;
Note: SQLiteStudio shows 1000 rows per page. Use the pagination buttons to see more
Show more columns:
select year, plot, genus, species_id from plot_surveys;
16
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Selecting Data with SQLBasic select statement template is
SELECT column1, column2, … FROM table;
SQL is not case sensitive; upper case by convention
SQL can have line breaks and white space
Column or table names may need to be quoted if they contain spaces or special characters. (Double quotes only for SQLite)
SELECT “favorite thing”, fav_idFROM “a few of my!”;
Our database uses simple column and table names17
Selecting Data with SQL
Goal : format output of SQL for better exporting
rename columns with the “as” keyword
select Year, plot as "Plot Number" from plot_surveys;
You can include ALL columns with the special character *
select * from plot_surveys
18
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Calculating with ArithmeticGoal : convert weight from grams to ounces
SQL supports arithmetic with + - * /
select year, plot, species_id, weight*0.035274 as weight_oz from plot_surveys;
Bonus: convert to kilograms instead
select year, plot, species_id, weight/1000 as weight_kg from plot_surveys;
19
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Calculating with FunctionsGoal : remove extra decimals from conversion.
Round off converted value with the round() function.
select year, plot, species_id, round(weight*0.035274,2) as weight_oz from plot_surveys; FROM surveys;
functions take the form of function(argument1, argument2, etc)
The rounding function form is round(expression, <number of decimals>)
and can be combined by nesting : function1(function2(column))
All other SQL vendors support math functions e.g. log(), cos(), but SQLite DOES NOT to keep the code small.
20
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Combining Characters with || operator
Goal : put both Genus and Species in one column of output.
Character fields can be combined with the “||” operator. “A” || “B” => “AB”; add a space with “ “
select Year, plot as "Plot Number", genus || ' ' || species as "Species" from plot_surveys;
Bonus: how to combine date columns to 1977-3-1? Year|| '-' ||Month||'-'|| Day as “Survey Date"
21
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
SQL Filtering with “WHERE”Add a “WHERE…” clause in SQL statement add conditionals
SELECT column1, column2, etc FROM table WHERE conditions;
Try this: list only rows for one genus:
select year, month, plot, genus, species, weight from plot_surveys where genus = ‘Onychomys';
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
SQL Filtering with “WHERE”WHERE clause is : Numeric comparisonsCharacter comparisons (using single quotes)combination of logical conditions, using AND and OR
Try this: find “heavy” grasshopper mice (more than 40 g):
select year, month, plot, genus, species, weight from plot_surveys where genus = 'Onychomys' and weight >= 40;
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Re-using SQL: saving SQL files
We’d to re-use the SQL we are writing. There are several ways to do that.
save a SQL file. Plain text file with one or moe SQL commands
In SQLiteStudio, right click to select “save” or “open”
SQL files can have multiple SQL statements
Demo: download and open SQL file
Database Views : Reusable SQL
Goal: we want to use keep some SQL in our database so that :
• we can keep it together with the tables in one file;
• it can be shared in the database; and
• it can be used like another table (for more SQL)
• shortcut for often used WHERE conditions,
• select * from plot_surveys where genus = "Onychomys"
database ‘Views’ in practiceGoal: save our species filter for re-use, a query shortcut
1. Develop SELECT statement
SELECT * FROM plot_surveys WHERE genus = 'Onychomys';
2. Prefix statement with “create…” to make a view
CREATE VIEW os_surveys AS SELECT * FROM plot_surveys WHERE genus = 'Onychomys';
3. Test. Double click the view to open, like a table
4. Use. Incorporate this view just as you would a table
SELECT * FROM os_surveys WHERE year=1989
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
More ‘Views’ in practiceGoal: format the date nicely for this view
Dates take complex functions to work with. Let’s save our date in format that SQLIite likes with the printf() function.
1. Develop SELECT statement that combines date columns into new column, along with remaining columns.
SELECT printf('%4d-%02d-%02d', year, month, day) as “survey_date”,* FROM plot_surveys WHERE genus = 'Onychomys';
2. Prefix statement with “create…” to make a view
CREATE VIEW os_surveys AS SELECT printf('%4d-%02d-%02d', year, month, day) as “survey_date”,* FROM plot_surveys WHERE genus = 'Onychomys';
3. ERROR! Why? We can’t create a view that already exists. Need to “drop” the view first
DROP VIEW os_surveys IF EXISTS;
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Empty data: NULLS
Goal : filter out empty data
SQL databases have a special value for fields when no data is present : NULL
You can find or ignore those with the “is null” or “is not null” where conditions.
Empty data — NULL values
Weights of rodents were not collected for all years, so we need to filter out rows with no weights.
Find where weight is not null select year, month, genus, species, weight from plot_surveys where weight is not null;
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Summarize with Aggregate Functions
How many rows data do we have for the species we are interested in?
We often want to summarize several rows: sum, average, count, etc.
SQL has “Aggregate functions” for basic calculations per group.
Goal: Count how many animals of our species were caught per yearUse count() and group by year
select year, count(row_id) from plot_surveys where genus = "Onychomys" group by year;
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Summarize with Aggregate Functions
Goal: take the average weight of Grasshopper mice caught per year
Use the avg() aggregate function - must also use “group by” clause for each year
select year, avg(weight) from plot_surveys where genus = "Onychomys" group by year
But what about when the weight is not taken (is null?)
select year, avg(weight) from plot_surveys where genus = “Onychomys” and weight is not null group by year
Same Answer because avg() excludes nulls automatically (in SQLite)
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Aggregate functions in practiceGoal: refine our results by adding a count of samples (n) and better formatting
Count() the rows with weights, round the average
select year, round(avg(weight),2) as weight_avg, count(record_id) as n from plot_surveys where genus = "Onychomys" and weight is not null group by year
Unlike avg(), count() includes nulls, so need to exclude them here
Bonus: create new column with year and month, group by both, and plot the time series of avg weight by month
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Average weights for Grasshopper mice
Bonus: add the month field, and plot the time series
select year || "-" || month as month, round(avg(weight),2) as annual_weight_avg from plot_surveys where genus = "Onychomys" group by year, month
1978-7 1979-9 1980-10 1981-11 1982-11 1983-12 1985-3 1986-5 1987-5 1988-6 1989-6 1990-7 1991-7 1992-8 1993-9 1994-10 1995-12 1996-12 1998-2 1999-3 2000-6 2001-8
38
17
20
25
30
35
Month
avg
wei
ght
g
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Ordering tables
Database tables are not stored with any particular order
sorted as needed, have an order based on key columns (more later).
Order results in queries using the “ORDER BY” and the “GROUP BY” if present SELECT <columns>FROM <table> WHERE <condition1 and/or condition2 ….>GROUP BY <columns>ORDER BY <columns>
ORDER BY in practiceWhat is the largest and smallest weights of Onychomys spp? Examine range of weights of Onychomys using by sorting from lightest to heaviest then the other way select weight, plot, year, month,genus, species from plot_surveys where genus = 'Onychomys' order by weight;
Sorting is ‘ascending’ by default, but DESC command will reverse that select weight, plot, year, month,genus, species from plot_surveys where genus = 'Onychomys' order by weight DESC;
weight plot year month genus species
56 4 1982 7 Onychomys leucogaster
56 14 1985 7 Onychomys leucogaster
51 5 1981 3 Onychomys leucogaster
49 3 1983 7 Onychomys leucogaster
Bonus: Use aggregate functions max() and min() to find that instead…
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Result in descending order
order by vs. max() or min()
What is the largest and smallest weights of Onychomys spp?
Use aggregate functions max() and min() to find that
select genus, species, max(weight), min(weight) from plot_surveys where genus = 'Onychomys' group by genus, species;
genus species max min
Onychomys leucogaster 56 10Onychomys sp. 24 18Onychomys torridus 46 5
Result doesn’t tell WHICH rows have the max/min, just the values
Data TypesAll columns must have single “data type” which is one of fIve or more categories
INTEGER. The value is a signed integer, maximally 2**63 to (2**63-1) ( 9,223,372,036,854,775,807)
• You may see other types in other databases such as SMALLINT or BIGINT which make storing more efficient.
REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
• FLOAT, SINGLE (precision, 4 byte storage), DOUBLE (precisions, 8 byte)
TEXT. The value is a text string, stored using the database encoding
• CHAR(N) = reserves n characters of storage, VARCHAR(n) variable storage sizes
• Let's not talk about encoding, that will take another hour
BLOB. Binary Large Object : not a number or text, a sound, a picture, stored exactly as it was input
DATE/TIME: most DBs have these formats
Handling Date/Time
Database vendors differ on how to store temporal values, but most have specific date and time types
SQLite does not have a specific format for dates/times for simplicity
SQLite can store dates and times as characters, among other things.
Best practice
• store dates as Year (4 digit) - Month (2 digit) - day(2 digit). “2014-03-23”
• store times as Hours::Minutes:Seconds “15:03:04
Working with DatesGoal : convert 3 date columns to single date for date calculations and comparisons.
SQLite uses special date functions. See https://www.sqlite.org/lang_datefunc.html This can be cumbersome but is most flexible.
To build a date:
select date(printf(‘%4d-%02d-%02d', year, month, day)) as “survey_date”, plot, species_id, from plot_surveys;
39
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Multi-table databases
How to add group characteristic?
This data has info on all animals that were trapped. Sometimes Birds or Lizards were trapped and recorded
Goal: we want to view all rodent data, by excluding all non-rodents
Solution: create a table of species and add this field
Species information in plot_surveys table
Goal: list only species information in this table
select species_id, genus, species from plot_surveys order by genus, species;
Results has species are repeated for each time trapped
Goal: show each species just one timeSolution: use the “distinct” keyword for unique data
select distinct species_id, genus, species from plot_surveys order by genus, species;
Note we could use ‘group by’ but that requires an aggregate function
Bonus: save this a view with create view species_present as
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Create a new table and fill that table with species
Goal : be able to add columns to species for refined queries
Create table
create table species ( species_id text, genus text, species text, taxa text);
Add data. Can insert data from a select
insert into "species" (species_id, genus, species) select distinct species_id, genus, species from plot_surveys;
The next step is to enter all the taxa values for each species
We are going to cheat and skip to a database that is complete…
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Download Final version of Portal Survey DB
In the interest of time, we’ll download a database with species as a separate table
1. Please download : http://git.io/vcYSQfile sql_class_final.sqlite3
2. Open with SQLite Studio In “Database” menu, select “connect to database”Browse to this final db above
3. Confirm there are 3 tables in this database
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
R in RMDB is for Relational
Goal : keep information simple and less redundant
E.g. Species is stored here as species_id, genus, species and tax.
Using multiple related tables is where SQL really shines.
We only need one of these values; the others are redundant
We can create a “species” table to hold this information, and “link” the two tables together.
Create a new species table
45
Tables are linked on common “key”
species.species_id
surveys.species_id
Full Database ModelHow to we ask questions about groups of food?
47
Summarize species using joins
Display data for grasshopper mice using a join with species table
Open SQL Editor window again and enter :
select weight, year, month, genus, species, taxa from surveys inner join species on surveys.species_id = species.species_id where species.genus = 'Onychomys';'
48
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
Summarize species using joins
Find the heaviest rodents Display maximum weights for all rodent species
USING ALL TECHNIQUES YOU’VE LEARNED SO FAR select genus, species, max(weight) as maxweight from surveys inner join species on surveys.species_id= species.species_id where species.taxa='Rodent' group by genus, species order by maxweight DESC;
49
Institute for Cyber-Enabled Research
How we will work
We are going to learn basics using lecture and hands-on exercises.
Exercises are denoted by the following icon in this presentation:
We will mostly be typing commands into our computerBold italic indicate terminal commands which I expect you type on your terminal
Don’t type command preceded by # (comments)
The sticky notes are provided to help us help you. When doing exercises, place a sticky note on top of your laptop:
• No Sticky I am working• Green I am done and ready to move on (yea!)• Red I am stuck and need some help (or more time)
CardinalityThere is one record for each species, but each species occurs many times in surveys.
Species->Surveys is one-to-many
Species were trapped many times in many traps
species -> traps is many-to-many
Optional. Expanding the data
Goal: determine if there is a correlation between abundance (captures) and precipitation.
We need to prepare the data for use with an analysis
Rainfall data is available but needs to be imported into the database.
1. Read the meta-data and download the CSV File
2. Examine the contents to determine table we need
3. Create a new table to hold the precipitation data
4. Import the data into this new table
Optional MaterialIntroduction to SQL
Optional: Adding Rainfall dataSearching the metadata, we find there is associated precipitation data. Can we incorporate that?
Variable name Variable definition Storage type
Variable codes, definitions, and notes
Year Year data collected Integer 1980–1989
Month Month data collected Integer 1–12
Precipitation Precipitation amount in rain guage
Double Measurement unit: millimeters
-99 = missing datatable adapted from http://esapubs.org/archive/ecol/E090/118/Portal_precipitation_metadata.htm
Download text file from github (TO DO: address)
Optional: adding rainfall dataCreate table
create table rainfall (year integer, month integer, precipitation double);
Download data from
http://esapubs.org/archive/ecol/E090/118/Portal_precipitation_19801989.csv
Import using SQLiteStudio Tools menu, import….
Variable name Variable definition Storage type
Variable codes, definitions, and notes
Year Year data collected Integer 1980–1989
Month Month data collected Integer 1–12
Precipitation Precipitation amount in rain guage
Double Measurement unit: millimeters
-99 = missing data
Epilogue: About SQL Databases
55
About our software
SQLite : C-language libraries to work with single- file database with full SQL and blazing speed. Provides command-line interface only. Maximum database size 140 Terabytes, ~ 10^12 rows of datahttps://www.sqlite.org/
SQLiteStudio: Desktop Application for working with SQLite format file, includes the SQLite library
• http://sqlitestudio.pl • Fullest featured of free, cross-platform SQLite Desktop Programs • Free, but a little funky • Manual install from Zip or DMG required (see instructions)
Many other programs and connectors for using SQLite format files from Statistical, Office, Scripting languages, web, etc.
56
About SQLite
SQLite is a “database engine” to which you send SQL commands to work on a database.
The program to send commands to the engine is called the client program. Ways to send SQL commands to the SQLite engine:
• Program like SQLiteStudio
• Script or other program
• SQLite’s stand-alone command line interface (CLI) (comes with a download of SQLite)
Most SQLite Tutorials and book use this command line CLI for teaching. OS X comes with it, but requires an install on Windows and Linux
Quick Demo
SQL is Declarative
The core of SQL is a declarative language.
• you state what you want; let the language processor to figure out how to deliver the desired results.
• restaurant menu: You ask for what you want and you don't know how it's prepared
• Less conceptual burden, becuase no implementation burden
• Fixed command structure SELECT stuff FROM something WHERE something like 'radical'
• Focus on data, not process = quicker coding for results
Programming languages are imperative (C, Java, Python, etc)
• each step must be explicit • analogy is recipe, instructions for preparation
• Process oriented, flexible
• Flow control - branching based on previous condition ( possible in declarity but verbose)
• More coding and conceptual overhead
58
Program + SQL = EfficientSQL is great way to get data out of a system, so many programs for science and the web use SQL inside of scripts
Pseudo code:
if(today.weekday?) SQL="select stuff from work_to_dos;” else // weekend! SQL=“select stuff from home_to_dos;" endif daily_do_list = Db.getresults(sql); print(daily_do_list)
Yes this could be expressed in sql, if to-do items were stored in single table and categorized, and then used 'case' satement in where clause based on date.
59
How to use SQL in a script
Goal: I’d like to use SQLite in shell script for workflow
Once SQLite is installed can use at the command line/terminal as follows. Most books on SQLite use this technique.
sqlite3 sql_class_final.sqlite3 < “select * from surveys where year=1977”
Resources
http://sqlite.org for documentation and details
Nice book on SQLite with great detail:
https://www.safaribooksonline.com/library/view/using-sqlite/9781449394592/