web application engineering data modeling
TRANSCRIPT
Web Application EngineeringData Modeling
Matthew Dailey
Information and Communication TechnologiesAsian Institute of Technology
Matthew Dailey (ICT-AIT) Web Eng 1 / 54
Readings
Readings for these lecture notes:
- Greenspun, SQL For Web Nerds.
- Fowler, Patterns of Enterprise Application Architecture,Addison-Wesley, 2003.
- Ruby, Copeland, and Thomas, Agile Web Development with Rails,6th edition, 2020.
These notes contain material © Greenspun, 2006; Fowler, 2003; Ruby,Copeland, and Thomas, 2020.
Matthew Dailey (ICT-AIT) Web Eng 2 / 54
Outline
1 Introduction
2 SQL basics
3 Useful PostgreSQL features
4 Database normalization
5 Object-relational mapping
6 NoSQL (Mongo)
Matthew Dailey (ICT-AIT) Web Eng 3 / 54
Introduction
To this day, the RDBMS is the king of data storage.
NoSQL databases have important use cases (very large datasets,semi-structured or unstructured data, document oriented processing), butthese aren’t relevant for small and medium sized applications.
We will thus learn (or review for some of you) how to use the RDBMS asan effective means of persistence for our Web applications.
Later, we will take a look at NoSQL databases such as MongoDB.
For all practical purposes, a “relational database is a big spreadsheet thatseveral people can update simultaneously.” (Greenspun).
Matthew Dailey (ICT-AIT) Web Eng 4 / 54
Outline
1 Introduction
2 SQL basics
3 Useful PostgreSQL features
4 Database normalization
5 Object-relational mapping
6 NoSQL (Mongo)
Matthew Dailey (ICT-AIT) Web Eng 5 / 54
SQL basicsTables
Each table in the database is a spreadsheet with fixed columns, eachhaving a name and a data type. The rows are unordered. Example:
create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
The primary key constraint means this column must be unique, and inPostgreSQL causes an index to be created on the column.
Indices allow efficient search of one or more columns in a table.
Matthew Dailey (ICT-AIT) Web Eng 6 / 54
SQL basicsPopulating and modifying tables
We use SQL’s insert command to add data to a table:
insert into mailing_list ( name, email )
values (’Philip Greenspun’,’[email protected]’);
We can add and delete new columns:
alter table mailing_list add phone_number varchar(20) not null;
alter table mailing_list drop phone_number;
For queries we use select:
select * from mailing_list;
Matthew Dailey (ICT-AIT) Web Eng 7 / 54
SQL basicsMany-to-one relationships
Most folks have more than one phone number. Should we put a list in thephone number column? It might work but our data would not be in“normal” form (more on normalization later).
For many-to-one relationships we normally use a separate table:
create table phone_numbers (
email varchar(100) references mailing_list,
phone_type char(1) check ( phone_type in
( ’W’, ’H’, ’M’, ’F’ )),
phone_number varchar(20)
);
The keyword references creates a consistency constraint between thetwo tables. Try adding phone numbers for email addresses that are not inthe mailing list table.
OK, insert some data into the table.
Matthew Dailey (ICT-AIT) Web Eng 8 / 54
SQL basicsJoins
A join combines information from more than one table:
select * from mailing_list, phone_numbers;
But we don’t get what we want — we get the cross product of the rows inthe two tables. We have to be more selective:
select * from mailing_list, phone_numbers
where mailing_list.email = phone_numbers.email;
Other useful commands: delete from mailing list and update
mailing list.
Matthew Dailey (ICT-AIT) Web Eng 9 / 54
SQL basicsData types
We saw a few of SQL’s data types already. Here is a more complete butstill partial list, for PostgreSQL:
Fixed-length strings (char(len))
Variable-length strings (varchar(len))
Variable-length strings, no limit on length (text)
Variable-length binary data (bytea)
Dates and times (date, time, timestamp)
Numbers (integer, numeric, real precision, doubleprecision, serial, others)
Other more complex, less-used types
Matthew Dailey (ICT-AIT) Web Eng 10 / 54
SQL basicsConstraints
Values can also be constrained:
not null
unique
primary key
check
references
That’s all you need for some simple data modeling!
Matthew Dailey (ICT-AIT) Web Eng 11 / 54
SQL basicsKeys: natural or surrogate?
A key is an attribute or group of attributes what uniquely identifies a rowof a table.
Composite keys are made up of more than one attribute.
Natural keys are attributes in the real world: citizen ID number, etc.
Surrogate keys are artifical keys introduced into the data model that haveno relationship to the real-world entities being modeled.
Many analysts prefer natural keys because surrogate keys are artificial andunrelated to the business logic.
But natural keys may be coupled to the business logic and might thereforechange when requirements change.
Most Web application frameworks are easiest to work with when you allowthem to define their own surrogate key for every table.
Matthew Dailey (ICT-AIT) Web Eng 12 / 54
Outline
1 Introduction
2 SQL basics
3 Useful PostgreSQL features
4 Database normalization
5 Object-relational mapping
6 NoSQL (Mongo)
Matthew Dailey (ICT-AIT) Web Eng 13 / 54
Useful PostgreSQL featuresUser-defined functions
PostgreSQL provides the PL/pgSQL language for specification ofuser-defined functions. As a simple example consider f (x) = 2x :
create or replace function doubleint( x integer )
returns integer as $$
declare y integer;
begin
y := 2 * x;
return y;
end;
$$ language plpgsql;
Before creating a first PL/pgSQL function in your database, you must usethe shell command createlang plpgsql apache (use your database’sname instead of apache).
Now queries like select doubleint( 10 ); should work.
Matthew Dailey (ICT-AIT) Web Eng 14 / 54
Useful PostgreSQL featuresTriggers
PL/pgSQL functions returning trigger can be set to executeautomatically when a table is changed.
Example: automatically create a change log entry every time a studentchanges projects:create table project_changes (
studentid integer references students,
oldproj integer references projects,
newproj integer references projects,
update_timestamp timestamp
);
create or replace function proj_log() returns trigger as $PROC$
begin
if ( NEW.studentid = OLD.studentid and
NEW.projectid <> OLD.projectid ) then
insert into project_changes (
studentid, oldproj, newproj, update_timestamp
) values (
NEW.studentid, OLD.projectid, NEW.projectid,
current_timestamp
);
end if;
return NEW;
end;
$PROC$ language plpgsql;
drop trigger proj_log_post on students;
create trigger proj_log_post after insert or update on students
for each row execute procedure proj_log();
Matthew Dailey (ICT-AIT) Web Eng 15 / 54
Outline
1 Introduction
2 SQL basics
3 Useful PostgreSQL features
4 Database normalization
5 Object-relational mapping
6 NoSQL (Mongo)
Matthew Dailey (ICT-AIT) Web Eng 16 / 54
Database normalizationIntroduction
A normalized database only stores atomic data in a non-redundant form.
The concept of normal form for relational databases was proposed by E.F.Codd in 1970.
Normalizing a database means ensuring that all data in every table isatomic and depends only on the primary key for that table.
Normalization means all dependencies are explicit in the data model. Thismakes it easier to maintain the database in a consistent state.
There are many levels of normalization. The most important are first,second, and third normal form.
Matthew Dailey (ICT-AIT) Web Eng 17 / 54
Database normalizationFirst normal form
Criteria for first normal form:
All columns in every table are atomic (nondecomposable).
Every row of every table has a unique primary key.
Example: conference program committee website:
Papers are submitted by potential authors
Papers are reviewed by committee members (who can also be authors)
The program chair makes acceptance and rejection decisions based onthe reviews.
Papers have an author list, a title, a list of keywords, a link to the PDFsubmission, a set of reviews, and a decision.
Reviews have a single author, a paper being reviewed, comments, andratings from 1–5 for technical quality, originality, and presentation.
Matthew Dailey (ICT-AIT) Web Eng 18 / 54
Database normalizationFirst normal form
1NF procedure:
Consider each relation and break non-atomic attributes into separatetables.
Add the relationships between the tables.
Determine the primary keys.
Matthew Dailey (ICT-AIT) Web Eng 19 / 54
Database normalizationFirst normal form
For atomicity, we need separate tables for (at least):
papers
people
keywords
reviews
Relationships:
Papers to authors: many to many. Requires a new table,papers authors relating the two.
Papers to keywords: many to many. Requires a new table,papers keywords relating the two.
Papers to reviews: one to many. Requires a foreign key reference inreviews.
People to reviews: one to many. Requires a foreign key reference inreviews.
Matthew Dailey (ICT-AIT) Web Eng 20 / 54
Database normalizationFirst normal form
Keys:
papers: no natural key. Introduce surrogate paper id.
people: no natural key. Introduce surrogate person id.
keywords: the keyword itself must be unique, so it is a natural key.
reviews: the paper, reviewer pair is unique. It is a natural(composite) key.
With a unique key for all tables, and only atomic data, our database is infirst normal form.
Matthew Dailey (ICT-AIT) Web Eng 21 / 54
Database normalizationSecond normal form
Criteria for second normal form:
The database is in 1NF
There should be no columns dependent on only part of a compositekey.
Example: suppose we had a column reviewer home page in the reviews
table. This would be atomic but redundant, and should be moved to thepeople table.
Matthew Dailey (ICT-AIT) Web Eng 22 / 54
Database normalizationThird normal form
Criteria for third normal form:
The database is in 2NF
There should be no columns dependent on non-key columns.
Example: suppose for each review, we have a field originality (aninteger between 1 and 5) and originality desc (“Groundbreaking”,“Novel”, “Somewhat new”, “Minor variation of existing work”, and“Complete ripoff”) describing what the rating means.
We can see that originality desc depends directly on originality
which is not a key for reviews.
To achieve 3NF we should move originality desc into a new table andmake originality be a foreign key reference.
Matthew Dailey (ICT-AIT) Web Eng 23 / 54
Database normalizationDenormalization
Normalization simplifies data updates and changes to the data model.
Normalization leads to more complex queries with many joins. This hasimplications for performance.
Databases that are primarily transactional should emphasize normalization.
Databases that are primarily read only might use denormalization toimprove performance and simplify the queries sent to the RDBMS.
The preferred denormalization technique is to use indexed views.
If denormalization is done at the data model level, constraints should beused to ensure consistency of the redundant data.
Matthew Dailey (ICT-AIT) Web Eng 24 / 54
Outline
1 Introduction
2 SQL basics
3 Useful PostgreSQL features
4 Database normalization
5 Object-relational mapping
6 NoSQL (Mongo)
Matthew Dailey (ICT-AIT) Web Eng 25 / 54
Object-relational mappingIntroduction
Most SQL APIs return an array of hash arrays or a similar structure inresponse to queries.
Ruby example: the Sequel database API provides a row abstraction fordatabase rows.
Next page: Sequel example.
In Ubuntu, you need gems pg and sequel.
You’ll also need a username with password for the database as theconnection is through a network socket. In psql runalter user username with password ’password’;
(note that you don’t put quotes around the username).
Matthew Dailey (ICT-AIT) Web Eng 26 / 54
Object-relational mappingIntroduction
Ruby Sequel example (put in a text file such as db access.rb and runfrom the command line using ruby db access.rb:
require "sequel"
dbh = Sequel.connect(
"postgres://mdailey:password@localhost/wae_students_development")
dbh[:students].each do |row|
row.keys.each do |key|
printf "%s: %s ", key, row[key]
end
print "\n"
end
Matthew Dailey (ICT-AIT) Web Eng 27 / 54
Object-relational mappingIntroduction
In object oriented analysis and design (OOAD) we normally construct adomain model containing the entities in the business domain.
If we are using OOAD and an object-oriented programming language likeRuby or Java, we want to work with objects, not database rows.
But what if we are stuck with a RDMBS? The simplest thing to do is tomap database rows directly to domain model objects.
The Active Record pattern for enterprise applications is one of thesimplest approaches to so-called object-relational mapping.
Matthew Dailey (ICT-AIT) Web Eng 28 / 54
Object-relational mappingActive Record
Active Record
An object that wraps a row in a database table or view, encapsulates thedatabase access, and adds domain logic on that data.
Fowler (2003), Fig. 3.3
Matthew Dailey (ICT-AIT) Web Eng 29 / 54
Object-relational mappingActive Record
Some popular Active Record implementations:
Ruby ActiveRecord (decoupled from Rails way back in version 3.0)
CakePHP
.NET Castle
There are many others.
Matthew Dailey (ICT-AIT) Web Eng 30 / 54
Object-relational mappingActive Record
What Active Record implementations can do for us:
Automatically construct an instance of Active Record from a SQLresult row.
Automatically construct a SQL insert from a given instance of ActiveRecord.
Provide static finder methods via reflection that return Active Recordinstances.
Map getters and setters to SQL selects and updates, transformingSQL data types to reasonable native types.
Matthew Dailey (ICT-AIT) Web Eng 31 / 54
Object-relational mappingActive Record
Some advantages of Active Record:
It works very well when the domain model and business logic aresimple.
Some disadvantages:
It cannot handle complex mappings from objects to relations.
It couples the domain logic to the database schema.
Matthew Dailey (ICT-AIT) Web Eng 32 / 54
Object-relational mappingActive Record in Rails
Some key features of Rails ActiveRecord:
Object schema is constructed on the fly from the database schema.
Transparent lazy fetching.
Transparent optimistic locking via row versioning.
Simple support for associations between classes.
Transaction support.
Validations.
Value objects.
Single table inheritance.
Matthew Dailey (ICT-AIT) Web Eng 33 / 54
Object-relational mappingActive Record in Rails
Conventions
Each database table has a surrogate primary key, id.
The model class name is singular and UpperCamelCase (e.g.Student); the table name is the plural form of the object name (e.g.students).
Foreign key reference names are written classname id.
Join tables for many-to-many associations are named for the twotables they join, e.g., projects students.
Default behavior can be changed as necessary (e.g. invoke class methodset table name to use a non-standard table name).
Matthew Dailey (ICT-AIT) Web Eng 34 / 54
Object-relational mappingActive Record in Rails: one-to-many associations
Example: students and their projects. Domain model:
Student
+studentid: integer
+name: string
Project
+name: string
+url: string
+students
*
+project
1
Corresponding database schema:
Matthew Dailey (ICT-AIT) Web Eng 35 / 54
Object-relational mappingActive Record in Rails: one-to-many-associations
After creating the database tables (through a direct admin tool or via Railsmigrations), we create the model classes:
app/models/project.rb:class Project < ActiveRecord::Base
has_many :students
end
app/models/student.rb:class Student < ActiveRecord::Base
belongs_to :project
end
The method calls belongs to and has many set up the one-to-manyrelationship between projects and students.
Other methods for associations include has one, andhas and belongs to many.
Matthew Dailey (ICT-AIT) Web Eng 36 / 54
Object-relational mappingActive Record in Rails: one-to-many associations
To thoroughly test your ActiveRecord classes, it’s easiest to work from theconsole. Try the following for an example:
% script/console
>> s=Student.create
>> s.project = Project.create :name => "Soi Cats and Dogs",
:url => "web13.cs.ait.ac.th"
>> s.name = "Matthew Dailey"
>> s.studentid = 123456
>> s.save
>> s = Student.find(1)
>> Project.find(:all)
>> s = Student.find_by_name("Matthew Dailey")
>> s = Student.find_by_name_and_studentid( "Matthew Dailey", 123456 )
>> Student.find_by_sql( "select * from students where students.name like ’Matt%’" )
You might want to tail -f log/development.log.
Note that new creates an instance in memory only, but create creates aninstance and commits it to the database.
Matthew Dailey (ICT-AIT) Web Eng 37 / 54
Object-relational mappingActive Record in Rails: many-to-many associations
Now for a many-to-many relationship.
Suppose I need to record information about peer evaluations of yourprojects.
We need to set up a many-to-many relationship between students andprojects.
Since the association has an attribute (the score) we have to create anActiveRecord model for the join table:1
% script/generate model ProjectEvaluation score:integer project:references \
student:references
1There is an ActiveRecord method has and belongs to many that may be moreconvenient if you don’t need any attributes on the association.
Matthew Dailey (ICT-AIT) Web Eng 38 / 54
Object-relational mappingActive Record in Rails: many-to-many associations
In app/models/project evaluation.rb, add:
class ProjectEvaluation < ActiveRecord::Base
belongs_to :project
belongs_to :student
end
To the Student and Project models, add the method call
has_many :project_evaluations
That’s it! From the console, try
s = Student.find(1)
p = Project.find(2)
pe = ProjectEvaluation.create :student => s, :project => p, :score => 3
s.project_evaluations
Lastly, try adding
has_many :evaluators, :through => :project_evaluations, :source => :student
to the Project model (what is the purpose of this?).Matthew Dailey (ICT-AIT) Web Eng 39 / 54
Object-relational mappingActive Record in Rails: transactions
Oftentimes it will be important to group multiple database operations intoa single atomic transaction.
For example:
% script/generate model Student name:string account_balance:float
% rake db:migration
% script/console
>> bill = Student.create :name => ’Bill G’, :account_balance => 10000000.0
>> matt = Student.create :name => ’Matt D’, :account_balance => 100.0
>> bill.account_balance -= 10000
>> matt.account_balance += 10000
>> bill.save
>> matt.save
If an exception occurs while saving the second updated student, Bill G.loses 10,000 baht.
Matthew Dailey (ICT-AIT) Web Eng 40 / 54
Object-relational mappingActive Record in Rails: transactions
It would be safer to encapsulate both operations in a transaction:
>> Student.transaction do
?> bill.save
>> matt.save
>> end
If any exception occurs during the transaction, it is rolled back.
Matthew Dailey (ICT-AIT) Web Eng 41 / 54
Object-relational mappingActive Record in Rails: optimistic locking
Transactions, except with strict serializable isolation (the highest level ofisolation provided in the SQL standard, which locks data read by anytransaction), don’t help with the problem of lost updates.
Consider the following code executing concurrently in two threads:
Thread 1
s = Student.find_by_name "Bill G"
s.account_balance += 1000000
s.save
Thread 2
s = Student.find_by_name "Bill G"
s.account_balance += 1000000
s.save
What should happen, and what actually happens, with snapshot isolationand serializable isolation?
Note that PostgreSQL does not support full serializable isolation.
Matthew Dailey (ICT-AIT) Web Eng 42 / 54
Object-relational mappingActive Record in Rails: optimistic locking
Optimistic locking means we allow concurrent users to perform any actionthey like but track updates to the database.
When one user attempts to update an old version of a record, anexception and transaction rollback should occur.
In Rails, optimistic locking can be enabled on any ActiveRecord class byadding a version column to the database table:
alter table students add column lock_version int default 0;
The versions are transparently updated and checked by the ActiveRecordbase class.
Try the concurrent access scenario again with this change.
Matthew Dailey (ICT-AIT) Web Eng 43 / 54
Object-relational mappingActive Record in Rails
We’ve covered many of the features of Rails’ implementation of ActiveRecord. There are a few others of note:
Value objects
Single-table inheritance
Polymorphic associations
Matthew Dailey (ICT-AIT) Web Eng 44 / 54
Object-relational mappingData Mapper
Active Record maps directly between database tables and domainobjects.
Data Mapper is an alternative pattern that decouples the domain modelfrom the database schema.
Data Mapper
A layer of mappers that moves data between objects and a database whilekeeping them independent of each other and the mapper itself.
Fowler (2003), Fig. 3.4
Matthew Dailey (ICT-AIT) Web Eng 45 / 54
Object-relational mappingData Mapper
Data Mapper is widely implemented:
Hibernate for Java
MassiveJS and many others for JavaScript
SQLAlchemy for Python
DataMapper for Ruby
Even if there is no existing implementation for your preferred environment,it is easy to roll your own, starting small and gradually improving theimplementation over time.
Matthew Dailey (ICT-AIT) Web Eng 46 / 54
Outline
1 Introduction
2 SQL basics
3 Useful PostgreSQL features
4 Database normalization
5 Object-relational mapping
6 NoSQL (Mongo)
Matthew Dailey (ICT-AIT) Web Eng 47 / 54
NoSQL (Mongo)Introduction
Applications dealing with “big” data:
High volume: we need to store millions or billions of records.
High velocity: the data are arriving and need to be processed at avery high rate such as thousands of records per minute.
High variety: we have potentially many sources providing data thatare structured, unstructured, and semi-structured.
Under these conditions, designing schemas, migrating every time we add anew data source or data format, ensuring consistency, and guaranteeingisolated transactions may all be bottlenecks.
A possible solution: throw away your schemas, your consistency rules,and/or your isolated transactions!
[Think about where SQL and NoSQL would be best used: a bankingapplication and a Facebook post analysis engine.]
Matthew Dailey (ICT-AIT) Web Eng 48 / 54
NoSQL (Mongo)Types of NoSQL databases
There are several types of NoSQL databases:
Key-value: dictionaries wherein values are indexed by a single key
Document: key-value databases in which the value is a documentrepresented in JSON, XML, etc.
Wide column: row-oriented tables with dynamic columns
Graph: data are nodes with edges
MongoDB is probably the most popular NoSQL database. It is documentoriented.
Matthew Dailey (ICT-AIT) Web Eng 49 / 54
NoSQL (MongoDB)MongoDB features
Besides simple key-value storage and retrieval, MongoDB adds
Sharding: distributing the data across multiple machines for highthroughput
Replication, duplication, load balancing for high availability at scale
Document validations: imposing consistency rules where necessary
Fine-grained locking: reader and writer locks at the global, database,or collection level to deal with concurrency issues.
Matthew Dailey (ICT-AIT) Web Eng 50 / 54
NoSQL (Mongo)Quick MongoDB tutorial
To get a feel for MongoDB, first install it:
$ sudo apt install mongodb
Start a shell:
$ mongo
MongoDB shell version v3.6.8
connecting to: mongodb://127.0.0.1:27017
Implicit session: session { "id" : UUID("fca1c52f-ca00-4819-9821-7f9576077b33") }
MongoDB server version: 3.6.8
Server has startup warnings:
...
>
Figure out what db we’re connected to:
> db
test
Matthew Dailey (ICT-AIT) Web Eng 51 / 54
NoSQL (Mongo)Quick MongoDB tutorial
Switch to the studentdb database:
> use studentdb
Insert some data into a new collection:
> db.projects.insertMany([
... { name: "Soi Cats and Dogs", url: "http://scad.org" },
... { name: "ICT Infosystem", url: "http://ict-info.ait.ac.th" }
... ])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("612eba9d84616b17e76630a4"),
ObjectId("612eba9d84616b17e76630a5")
]
}
>
Search the collection for a document:
> db.projects.find({name: "Soi Cats and Dogs"})
{ "_id" : ObjectId("612eba9d84616b17e76630a4"), "name" : "Soi Cats and Dogs", "url" : "http://scad.org" }
>
Matthew Dailey (ICT-AIT) Web Eng 52 / 54
NoSQL (Mongo)Quick MongoDB tutorial
Generally we should avoid references where possible, but when we needone document to refer to another, we can use the id field:
> var project = db.projects.find({name: "Soi Cats and Dogs"}).next();
> project
{
"_id" : ObjectId("612eba9d84616b17e76630a4"),
"name" : "Soi Cats and Dogs",
"url" : "http://scad.org"
}
> db.students.insertMany([
... { name: "Matt Dailey", studentid: "123456", project_id: project._id },
... { name: "Bishal Khanal", studentid: "123457", project_id: project._id }
... ]);
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("612ecf3a84616b17e76630a6"),
ObjectId("612ecf3a84616b17e76630a7")
]
}
Matthew Dailey (ICT-AIT) Web Eng 53 / 54
NoSQL (Mongo)Quick MongoDB tutorial
> db.students.find()
{ "_id" : ObjectId("612ecf3a84616b17e76630a6"), "name" : "Matt Dailey",
"studentid" : "123456", "project_id" : ObjectId("612eba9d84616b17e76630a4") }
{ "_id" : ObjectId("612ecf3a84616b17e76630a7"), "name" : "Bishal Khanal",
"studentid" : "123457", "project_id" : ObjectId("612eba9d84616b17e76630a4") }
Things to note here:
The shell interprets our input as JavaScript
The find() method returns a cursor, i.e., an object that has to beiterated to extract its data.
The cursor’s next() method returns the next record in the cursor’sunderlying collection. Use code such as
while (cursor.hasNext()) {
var record = cursor.next();
printjson(record);
}
to iterate over the query’s results.Matthew Dailey (ICT-AIT) Web Eng 54 / 54