cs317 file and database systemsmercury.pr.erau.edu/~siewerts/cs317/documents/lectures/...– bit-rot...
TRANSCRIPT
September 23, 2015 Sam Siewert
CS317 File and Database Systems
Lecture 5, Part-2 – ORDBMS http://www.ibmbigdatahub.com/video/ibm-big-data-minute-drowning-petabytes
SQL Theory and Standards
DBMS Design (Connolly-Begg Chapter 10)
Part-2 Development Lifecycle
Sam Siewert
2
For Discussion… Big Data – Velocity, volume, variety, veracity [2014] 1. Daily – 2.5 quintillion bytes (2,500,000,000,000,000,000) or 2 Exabytes, or
46,566,128 50GB Blu-Ray Discs, IBM Estimate
2. Annually – 7.5 billion in global population, produce/consume 2.25 unique Blu-Rays per Year, or 23 DVDs (assuming even distribution – unlikely)
3. Annually – If produced/consumed by US population alone – 53 Blu-Rays per Year or 564 DVDs per person
4. Data in Total is 40 trillion gigabytes or 800 billion Blu-Rays for just over 100 (unique) Blu-Rays per person globally
5. Data by Powers of 10 and 2 – 264 is 16 Exabytes of Addressable Data [PC limit]
6. Data Max Veolicity is 100 Gbps is Fastest Ethernet [8b/10b – 10 billion bytes per second]
7. How much is Truly Unique Data vs. Duplicated
8. What is the Quality (Veracity) of this Data?
Sam Siewert 3
Big Data Volume and Velocity Can Be Estimated as Shown – Disk drives shipped and in use – Online data only, or removable and archive media as well? – Bit-rot (media eventually fails, limited storage lifetime)
Variety, Depends on Level of Data Duplication – Enterprise Storage System Deduplication – E.g. EMC Deduplication – Internet Archive [petabytes] and Wayback machine,
http://www.loc.gov/about/general-information/ [traditional volumes], Stanford Digital Repository, National Archives, National A/V Conservation
Veracity, perhaps Most Challenging Part – Is the Data Correct – Not Corrupted – Is it Valid – From a Known, Trusted Source, Corresponding to
Metadata Description – Has the Data Been Processed and if so, How? – Is it Raw Data (from a sensor, user, other)? – Veracity is difficult – E.g. http://berkeleyearth.org/about-data-set
Sam Siewert 4
Quiz #2
Let’s Go Over it …
Sam Siewert
5
Quiz #2 Average was 68.3, Std. Deviation was 17.5 - Primarily Need to Study Book More Quiz #1 – 81.5, 8.5 (Ideal) – Mostly from In-Class Notes Let’s Go Over Solutions Now with Book Citations Solutions Provide References Back to the Book – Posted on Canvas as Well
Sam Siewert 6
Quiz #2 - Review
Sam Siewert 7
Equi-join is a specific type of Theta-Join where the Predicate tests for EQUIVALENCE ONLY
Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam
Quiz #2 - Review
Sam Siewert 8
See p. 119, 132, 1) Selection [Restriction], 2) Projection [Projection], 3) Union [Join – Specific Union], 4) Set Difference [Codd Omits], 5) Cartesian Product [Permutation]
Encouraged! See Class Notes and Example of TC,RA, and Use of DISTINCT
Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam
Required [Except Intersection]
Pearson Education © 2014 9
intersection can be composed as R – (R – S)
Nice to Have! - Relational Algebra Operations – Composed from Required
Pearson Education © 2014 10
Quiz #2 - Review
Sam Siewert 11
Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam
PK, FK EQUIVALENCE Book Says that EQUIVALENCE for Equi-Join is Predicate that Uses “=“ – p. 126 (bottom) This is Simplistic, especially for Multi-table Joins and PKs formed from more than One Attribute E.g. if(X == Y) Can in Fact Involve a Complex Comparison – E.g. if X is a vector = [1, 1, 3] and Y is a vector, then
EQUIVALENCE requires Comparison of Each Component – If((X[0] == Y[0]) && (X[1] == Y[1]) && (X[2] == Y[2]))
Likewise, Consider Simple Tuples of FirstName, LastName, DoB [PK=FirstName, LastName] Another Relation [FK=FirstName, LastName] with Street Address, City, Zipcode Sam Siewert 12
Join Cheat Sheet http://www.codeproject.com/KB/database/Visual_SQL_Joins/Visual_SQL_JOINS_orig.jpg
Sam Siewert 13
JOINS You Must Know MySQL Join Support – Inner, Cross, Left, Right, Outer, Natural, Multi-table with Predicates (Theta and Equi-Join) Cross-Join [p. 171, Matches Theory p. 126] Theta-Join [p. 170 – 3 Table Join] Equi-Join [p. 168-169] Natural-Join (Rarely Used, but Matches Theory on p. 127) Inner-Join (Not in Book! But, Common in MySQL) Alternative Form – Nested Queries [p. 164] Other Joins You are Not Responsible For (Less Useful)
Sam Siewert 14
Connolly-Begg Chapter 9
ORDBMS Extensions to SQL (SQL:2011)
Part -2
Sam Siewert
15
Unstructured Data BLOBs - Binary Large Objects – Images – Digital Video and Audio – Digital Media – Binary Data (Documents and Code), Perhaps Proprietary – http://mercury.pr.erau.edu/~siewerts/extra/images/example-
images/Moose-to-Skeleton.png – http://mercury.pr.erau.edu/~siewerts/extra/images/example-
images/Sled-Dogs.jpg – http://mercury.pr.erau.edu/~siewerts/extra/images/example-
images/korean-air-profile.jpg
CLOBs – Character Large Objects – Log files and Traces (IT) – Transaction Logs – XML, HTML, XDS, etc. [Web documents typically via HTTP,
HTTPS]
Sam Siewert 16
OO Concepts – “Real World” OOA – Object Oriented Analysis – Define Class Hierarchies (Abstract Classes with Attributes) and
Interfaces (Public, Private) and Methods (Operations) – Inheritance and Multiple Inheritance
OOD – OO Design – Encapsulation of Methods with Data (Attributes) for Abstract and
Derived Classes – Instantiation and Use of Objects [Use Cases]
OOP – Object Oriented Programming (Java, C++, …) – Programming Language – Direct Implementation of OOD – Implementation of Re-useable OO Code Libraries
Boost - http://www.boost.org/ OpenCV [C++ version] Many More … in other OOPLs
Sam Siewert 17
Classes Useful in Real World E.g. Biology – Kingdom, Phylum, Class, Order, Genus, Species [Multiple Inheritance Examples], Proven Use Parts – Components compose Sub-system(s) compose System(s) compose System of Systems Supports Re-Use of Objects Instantiated from Class Hierarchy Multiple Inheritance – Odd? Can be Abstract, Derived and Concrete
– E.g. Mathematical, Data Structures, Image Processing
– Organization of Information (Classes in Ontological Web Language)
– Simulation of Physical Systems – Most Often Software Libraries
Sam Siewert 18
http://en.wikipedia.org/wiki/Platypus#mediaviewer/File:Wild_Platypus_4.jpg
https://www.youtube.com/watch?v=kDay5OWDPn4#t=26
Quick Review of OO [not just C++] Encapsulation of Data and Methods in an Instantiated Object Objects are Instances from a Class Hierarchy
– Classes Define Encapsulated Data and Methods Virtual Functions can Be Refined Pure Virtual Functions in Abstract Classes Defined must be Refined
– Can Inherit Data and Methods from Parent Classes – Can In Fact Have Multiple Inheritance – Instantiated Objects Call Dynamically Bound Methods [Determined at Runtime]
Enables Semantic Overload [Can be Done without OO too]
– Overloaded Functions (Methods), Resolved by Type Signatures or Subtype/Sub-class
– Overloaded Operators (E.g. math operators work not only on integers and real numbers, but also vectors, matrices, and complex numbers)
– Derived Data Types from Base types
Polymorphism – Parametric – Re-useable Templates (E.g. Ada and Java Generic, C++ Template) – Functional Semantic Overloading – Dynamic or Subtype or Subclass Polymorphism using Late Binding
OOPs – Smalltalk to more current Java, C++, Ada95, … CLOS Sam Siewert 19
Operator and Function Overloading What is Required to Be OO? Common Consensus is – Encapsulation, Class Hierarchy, Polymorphism (Parametric & Subtype or Subclass with Late Binding), Inheritance Operator Overloading Not Required (E.g. Java Frowns Upon, No Support) Some PLs have OO Features, but not All Sam Siewert 20 http://en.wikipedia.org/wiki/Operator_overloading
Storing Objects in Relational Databases
One approach to achieving persistence with an OOPL is to use an RDBMS as the underlying storage engine. – O2 – merged with Informix and acquired by IBM – ObjectStore - http://www.objectstore.com/ – Objectivity - http://www.objectivity.com/products/objectivitydb – Versant - http://www.actian.com/products/operational-databases/
Requires mapping class instances (i.e. objects) to one or more tuples distributed over one or more relations. To handle class hierarchy, have two basics tasks to perform:
(1) design relations to represent class hierarchy; (2) design how objects will be accessed.
Pearson Education © 2009 21
Storing Objects in Relational Databases
Pearson Education © 2009 22
Mapping Classes to Relations Number of strategies for mapping classes to
relations, although each results in a loss of semantic information.
(1) Map each class or subclass to a relation: Staff (staffNo, fName, lName, position, sex, DOB, salary) Manager (staffNo, bonus, mgrStartDate) SalesPersonnel (staffNo, salesArea, carAllowance) Secretary (staffNo, typingSpeed)
Pearson Education © 2009 23
Mapping Classes to Relations (2) Map each subclass to a relation
Manager (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate) SalesPersonnel (staffNo, fName, lName, position, sex, DOB, salary, salesArea, carAllowance) Secretary (staffNo, fName, lName, position, sex, DOB, salary, typingSpeed)
(3) Map the hierarchy to a single relation Staff (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate, salesArea, carAllowance, typingSpeed, typeFlag)
Pearson Education © 2009 24
ORDBMSs RDBMSs currently dominant database technology with estimated sales of US$24billion in 2011, expected to grow to US$37billion by 2016 . Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required. Reject claim that extended RDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity. Can remedy shortcomings of relational model by extending model with OO features.
Pearson Education © 2014 25
ORDBMSs - Features OO features being added include: – user-extensible types, – encapsulation, – inheritance, – polymorphism, – dynamic binding of methods, – complex objects including non-1NF objects, – object identity.
Pearson Education © 2014 26
ORDBMSs - Features However, no single extended relational model. All models: – share basic relational tables and query
language, – all have some concept of ‘object’, – some can store methods (or procedures or
triggers).
Some analysts predict ORDBMS will have 50% larger share of market than RDBMS.
Pearson Education © 2014 27
Stonebraker’s View
Pearson Education © 2014 28
Advantages of ORDBMSs Resolves many of known weaknesses of RDBMS. Reuse and sharing: – reuse comes from ability to extend server to
perform standard functionality centrally; – gives rise to increased productivity both for
developer and end-user. Preserves significant body of knowledge and experience gone into developing relational applications.
Pearson Education © 2014 29
Disadvantages of ORDBMSs Complexity. Increased costs. Proponents of relational approach believe simplicity and purity of relational model are lost. Some believe RDBMS is being extended for what will be a minority of applications. OO purists not attracted by extensions either. SQL now extremely complex.
Pearson Education © 2014 30
SQL:2011 - New OO Features Type constructors for row types and reference types. User-defined types (distinct types and structured types) that can participate in supertype/subtype relationships. User-defined procedures, functions, methods, and operators. Type constructors for collection types (arrays, sets, lists, and multisets). Support for large objects – BLOBs and CLOBs. Recursion.
Pearson Education © 2014 31