web and dbmss web technology and dbmss- semistructured data and xml 1 advanced database system...

Download Web and DBMSs Web Technology and DBMSs- Semistructured Data and XML 1 Advanced Database System Lecturer: H.Ben Othmen

If you can't read please download the document

Upload: sherman-baldwin

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Web and DBMSs Web Technology and DBMSs- Semistructured Data and XML 1 Advanced Database System Lecturer: H.Ben Othmen
  • Slide 2
  • 2 Advanced Database System Part1 Web Technology and DBMSs Part1 Web Technology and DBMSs
  • Slide 3
  • Introduction to the Internet and Web Internet A worldwide collection of interconnected computer networks. Intranet A Web site or group of sites belonging to an organization, accessible only by the members of the organization. Extranet An intranet that is partially accessible to authorized outsiders. The Web A hypermedia-based system that provides a means of browsing information on the Internet in a non-sequential way using hyperlinks. 3 Advanced Database System
  • Slide 4
  • Introduction to the Internet and Web 4 Advanced Database System Fig6.1.The basic components of the Web environment
  • Slide 5
  • HTTP HTTP (HyperText Transfer Protocol) The protocol used to transfer Web pages through the Internet. HTTP is based on a requestresponse paradigm. An HTTP transaction consists of the following stages: Connection The client establishes a connection with the Web server. Request The client sends a request message to the Web server. Response The Web server sends a response (for example, an HTML document) to the client. Close The connection is closed by the Web server. 5 Advanced Database System
  • Slide 6
  • HTTP The main HTTP request types are: GET This is one of the most common types of request, which retrieves (gets) the resource the user has requested. POST Another common type of request, which transfers (posts) data to the specified resource. Usually the data sent comes from an HTML form that the user had filled in, and the server may use this data to search the Internet or query a database. 6 Advanced Database System
  • Slide 7
  • HTTP The main HTTP request types are: HEAD Similar to GET but forces the server to return only an HTTP header instead of response data. PUT (HTTP/1.1) Uploads the resource to the server. DELETE (HTTP/1.1) Deletes the resource from the server. OPTIONS (HTTP/1.1) Requests the servers configuration options. 7 Advanced Database System
  • Slide 8
  • HTTP HTTP response An HTTP response has a header containing the HTTP version, the status of the response, and header information to control the response behavior, as well as any requested data in a response body. Again, the header is separated from the body by a blank line. 8 Advanced Database System
  • Slide 9
  • HTML HTML (HyperText Markup Language) The document formatting language used to design most Web pages. 9 Advanced Database System
  • Slide 10
  • URL URL (Uniform Resource Locators) A string of alphanumeric characters that represents the location or address of a resource on the Internet and how that resource should be accessed. 10 Advanced Database System
  • Slide 11
  • Static and Dynamic Web Pages static Web page: the content of the document does not change unless the file itself is changed dynamic Web page: the content of a dynamic Web page is generated each time it is accessed a dynamic Web page can have features that are not found in static pages, such as: It can respond to user input from the browser (exp returning data requested by the completion of a form or the results of a database query) It can be customized by and for each user. 11 Advanced Database System
  • Slide 12
  • Web Services Web services are based on open standards and focus on communication and collaboration among people and applications Central to the Web services approach is the use of widely accepted technologies and commonly used standards, such as: The eXtensible Markup Language (XML). The SOAP (Simple Object Access Protocol), based on XML and used for communication over the Internet. The WSDL (Web Services Description Language) protocol, again based on XML and used to describe the Web service. The UDDI (Universal Discovery, Description and Integration) protocol, used to register the Web service for prospective users. 12 Advanced Database System
  • Slide 13
  • Approaches to Integrating the Web and DBMSs scripting languages such as JavaScript and VBScript; Common Gateway Interface (CGI), one of the early, and possibly one of the most widely used, techniques; HTTP cookies; extensions to the Web server, such as the Netscape API (NSAPI) and Microsofts Internet Information Server API (ISAPI); Java, J2EE, JDBC, SQLJ, JDO, Servlets, and JavaServer Pages (JSP); Microsofts Web Solution Platform:.NET, Active Server Pages (ASP), and ActiveX Data Objects (ADO); Oracles Internet Platform. 13 Advanced Database System
  • Slide 14
  • Scripting Languages how both the browser and the Web server can be extended to provide additional database functionality through the use of scripting languages Scripting languages allow the creation of functions embedded within HTML code. JavaScript and Jscript JavaScript and JScript are virtually identical interpreted scripting languages from Netscape and Microsoft, respectively. 14 Advanced Database System
  • Slide 15
  • Scripting Languages JavaScript and Jscript (continued) JavaScript and JScript are virtually identical interpreted scripting languages from Netscape and Microsoft, respectively. Both languages are interpreted directly from the source code and permit scripting within an HTML document It is a very simple programming language that allows HTML pages to include functions and scripts that can recognize and respond to user events such as mouse clicks, user input, and page navigation. 15 Advanced Database System
  • Slide 16
  • Comparison of JavaScript and Java applets. 16 Advanced Database System
  • Slide 17
  • PHP PHP (Hypertext Preprocessor) is another popular open source HTML-embedded scripting language that is supported by many Web servers including Apache HTTP Server and Microsofts Internet Information Server, and is the preferred Linux Web scripting language The goal of the language is to allow Web developers to write dynamically-generated pages quickly. One of the advantages of PHP is its extensibility, and a number of extension modules have been provided to support such things as database connectivity, mail, and XML. 17 Advanced Database System
  • Slide 18
  • Common Gateway Interface Common Gateway Interface (CGI): A specification for transferring information between a Web server and a CGI program. The Common Gateway Interface (CGI) defines how scripts communicate with Web servers. CGI scripts run in an environment created by a Web server program. Running a CGI script from a Web browser is mostly transparent to the user. 18 Advanced Database System
  • Slide 19
  • Advantages of the WebDBMS approach. Advantages that come through the use of a DBMS Simplicity Platform independence Graphical User Interface Standardization Cross-platform support Transparent network access Scalable deployment Innovation 19 Advanced Database System
  • Slide 20
  • Disadvantages of the WebDBMS approach. Reliability Security Cost Scalability Limited functionality of HTML Statelessness Bandwidth Performance Immaturity of development tools 20 Advanced Database System
  • Slide 21
  • Oracle Internet Platform The Oracle Internet Platform, comprising Oracle Application Server and the Oracle DBMS an n-tier architecture based on industry standards such as: HTTP and HTML/XML for Web enablement. Java, J2EE, Enterprise JavaBeans (EJB), JDBC and SQLJ for database connectivity, Java servlets, and JavaServer Pages (JSP) The Object Management Groups CORBA technology for manipulating objects 21 Advanced Database System
  • Slide 22
  • Oracle Internet Platform Internet Inter-Object Protocol (IIOP) for object interoperability and Java Remote Method Invocation (RMI). Web services, SOAP, WSDL, UDDI, ebXML, WebDAV, and LDAP. XML and its related technologies 22 Advanced Database System
  • Slide 23
  • Oracle Internet Platform 23 Advanced Database System Figure 6.2.Oracle Internet Application Server
  • Slide 24
  • Part2 Semi structured Data and XML 24 Advanced Database System
  • Slide 25
  • Semi structured Data Data that may be irregular or incomplete and have a structure that may change rapidly or unpredictably with a DBMS based on semistructured data, the schema is discovered from the data, rather than imposed a priori. In semistructured data, the information that is normally associated with a schema is contained within the data itself 25 Advanced Database System
  • Slide 26
  • Semi structured Data Semistructured data has gained importance recently for various reasons of which the following are particularly of interest: it may be desirable to treat Web sources like a database, but we cannot constrain these sources with a schema; it may be desirable to have a flexible format for data exchange between disparate databases; the emergence of XML (eXtensible Markup Language) as the standard for data representation and exchange on the Web, and the similarity between XML documents and semistructured data. 26 Advanced Database System
  • Slide 27
  • Example of semistructured data 27 Advanced Database System Figure 6.3. Sample representation of semistructured data in the DreamHome database.
  • Slide 28
  • Example of semistructured data 28 Advanced Database System Figure 6.4. A graphical representation of the data shown in the figure 6.3
  • Slide 29
  • Example of semistructured data branch office (22 Deer Rd), two members of staff (John White and Ann Beech), and two properties for rent (2 Manor Rd and 18 Dale Rd), and some relationships between the data. In particular, note that the data is not totally regular: for John White we hold first and last names, but for Ann Beech we store name as a single component and we also store a salary; for the property at 2 Manor Rd we store a monthly rent, whereas for the property at 18 Dale Rd, we store an annual rent; for the property at 2 Manor Rd we store the property type (flat) as a string, whereas for the property at 18 Dale Rd we store the type (house) as an integer value. 29 Advanced Database System
  • Slide 30
  • Object Exchange Model (OEM) Is one of the proposed models for semistructured data in OEM is schema-less and self-describing, and can be thought of as a labeled directed graph where the nodes are objects An OEM object consists of a unique object identifier (for example, &7), a descriptive textual label (street), a type (string), and a value (22 Deer Rd) 30 Advanced Database System
  • Slide 31
  • Object Exchange Model (OEM) Objects are decomposed into atomic and complex. An atomic object contains a value for a base type (for example, integer or string) and can be recognized in the diagram as one that has no outgoing edges. All other objects are called complex objects whose type are a set of object identifiers, and can be recognized in the diagram as ones that have one or more outgoing edges. 31 Advanced Database System
  • Slide 32
  • Object Exchange Model (OEM) An OEM object can be considered as a quadruple (label, oid, type, value). Example: we can represent the Staff object &4 that contains a name and salary, together with the name object &9 that contains the string Ann Beech and the salary object &10 that contains the decimal value 12000 as follows: {Staff, &4, set, {&9, &10}} {name, &9, string, Ann Beech} {salary, &10, decimal, 12000} OEM was designed specifically to handle the incompleteness of data, and the structure and type irregularity exhibited in this example. 32 Advanced Database System
  • Slide 33
  • XML XML A meta-language (a language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML. XML is a restricted version of SGML (Standard Generalized Markup Language), designed especially for Web documents. For example, XML supports links that point to multiple documents, as opposed to an HTML link that can reference just one destination document. 33 Advanced Database System
  • Slide 34
  • Advantages of XML. Simplicity Open standard and platform/vendor-independent Extensibility Reuse Separation of content and presentation Improved load balancing Support for the integration of data from multiple sources Ability to describe data from a wide variety of applications More advanced search engines New opportunities 34 Advanced Database System
  • Slide 35
  • Example XML 35 Advanced Database System Figure 6.5. Example XML to represent staff information.
  • Slide 36
  • Overview of XML XML declaration XML documents begin with an optional XML declaration (version of XML, encoding system used (UTF-8 for Unicode), standalone = "yes" (indicates that there are no external markup declarations). The second and third lines of the XML document relate to style sheets and DTD Elements Elements, or tags, are the most common form of markup. The first element must be a root element, which can contain other (sub)elements 36 Advanced Database System An element would be different from an element
  • Slide 37
  • Overview of XML : a root element An element begins with a start-tag (for example, ) and ends with an end-tag (for example, ) John White The element NAME is completely nested within the element STAFF and the elements FNAME and LNAME are nested within element NAME. 37 Advanced Database System
  • Slide 38
  • Overview of XML Attributes: Attributes are namevalue pairs that contain descriptive information about an element The attribute is placed inside the start-tag after the corresponding element name with the attribute value enclosed in quotes. Entity references: An entity reference starts with an ampersand (&) and ends with a semicolon (;), for example: < 38 Advanced Database System
  • Slide 39
  • Overview of XML Comments Comments are enclosed in tags and can contain any data except the literal string --. CDATA sections and processing instructions A CDATA section instructs the XML processor to ignore markup characters and pass the enclosed text directly to the application without interpretation A processing instruction is of the form, where name identifies the processing instruction to the application 39 Advanced Database System
  • Slide 40
  • Overview of XML Ordering in XML elements are ordered Example: the following two fragments with FNAME and LNAME elements transposed are different 40 Advanced Database System John White White John
  • Slide 41
  • Document Type Definitions (DTDs) DTD Defines the valid syntax of an XML document listing the element names that can occur in the document, which elements can appear in combination with which other ones, how elements can be nested, what attributes are available for each element type, and so on 41 Advanced Database System
  • Slide 42
  • Document Type Definitions (DTDs) 42 Advanced Database System Figure 6.6. DTD for the XML document of Figure 6.5
  • Slide 43
  • Simple Object Access Protocol (SOAP) is an XML-based messaging protocol that defines a set of rules for structuring messages is not tied to any particular operating system or programming language Is an important building block for developing Web services A SOAP message is an ordinary XML document containing the following elements: A required Envelope element that identifies the XML document as a SOAP message. 43 Advanced Database System
  • Slide 44
  • Simple Object Access Protocol (SOAP) An optional Header element that contains application specific information such as authentication or payment information A required Body Header element that contains call and response information. An optional Fault element that provides information about errors that occurred while processing the message. 44 Advanced Database System
  • Slide 45
  • Example SOAP message. 45 Advanced Database System Figure 6.7. Example SOAP message. This figure illustrates a simple SOAP message that obtains the price of property PG36.
  • Slide 46
  • XML Query Languages What is XQuery? XQuery is the language for querying XML data XQuery for XML is like SQL for databases XQuery is built on XPath expressions XQuery is supported by all major databases XQuery is a W3C Recommendation XQuery is a language for finding and extracting elements and attributes from XML documents. 46 Advanced Database System
  • Slide 47 30000 CONSTRUCT $L 47 Advanced Database System">
  • XML-QL example To find the surnames of staff who earn more than 30,000, we could use the following query: WHERE $S $F $L IN "http://www.dreamhome.co.uk/staff.xml" $S > 30000 CONSTRUCT $L 47 Advanced Database System
  • Slide 48
  • Examples of XQuery path expressions (1) Find the staff number of the first member of staff in the XML document of Figure 30.5. doc("staff_list.xml")/STAFFLIST/STAFF[1]//STAFFNO (2) Find the staff numbers of the first two members of staff. doc("staff_list.xml")/STAFFLIST/STAFF[1 TO 2]/STAFFNO (3) Find the surnames of the staff at branch B005. doc("staff_list.xml")/STAFFLIST/STAFF[@branchNo = "B005"]//LNAME 48 Advanced Database System
  • Slide 49
  • XML and Databases As the amount of data in XML format expands, there will be an increasing demand to store, retrieve, and query this data It is anticipated that there will be two main models that will exist: data-centric: the data is stored and transferred as XML is incidental and on the formats could also have been used, In this case, the data could be stored in a relational, object-relational, or object-oriented DBMS 49 Advanced Database System
  • Slide 50
  • XML and Databases XML has been completely integrated into the Oracle9i and Oracle10g systems document-centric model: the documents are designed for human consumption (for example, books, newspapers, and e-mail) data will be irregular or incomplete, and its structure may change rapidly or unpredictably. Underlying such a system, there may now be a native XML database (NXD). 50 Advanced Database System
  • Slide 51
  • XML and SQL New XML data type Example: Create a table to hold staff data as XML data. CREATE TABLE XMLStaff ( docNo CHAR(4), docDate DATE, staffData XML, PRIMARY KEY docNo); INSERT statement: INSERT INTO XMLStaff VALUES (D001, DATE2004-12-01, XML( SL21 Manager 194510-01 30000 ) ); 51 Advanced Database System
  • Slide 52
  • XML and SQL-XML Operators Several operators have been defined that produce XML values such as: XMLELEMENT, to generate an XML value with a single element as a child of its root item XMLFOREST, to generate an XML value with a list of elements as children of a root item. XMLCONCAT, to concatenate a list of XML values. XMLPARSE, to perform a non-validating parse of a character string to produce an XML value. 52 Advanced Database System
  • Slide 53
  • XML and SQL-XML Operators XMLROOT, to create an XML value by modifying the properties of the root item of another XML value. XMLCOMMENT, to generate an XML comment. XMLPI, to generate an XML processing instruction. Two useful functions are: XMLSERIALIZE, to generate a character or binary string from an XML value. XMLAGG, an aggregate function, to generate a forest of elements from a collection of elements. 53 Advanced Database System
  • Slide 54
  • Example Using the XML Operators List all staff with a salary greater than 20,000, represented as an XML element containing the member of staffs name and branch number as an attribute. SELECT staffNo, XMLELEMENT (NAME STAFF, fName || || lName, XMLATTRIBUTES (branchNo AS branchNumber) ) AS staffXMLCol FROM Staff WHERE salary > 20000; 54 Advanced Database System
  • Slide 55
  • Result table for Example 55 Advanced Database System
  • Slide 56
  • Native XML Database (NXD) Defines a (logical) data model for an XML document (as opposed Database to the data in that document) and stores and retrieves documents (NXD) according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. The XML document must be the unit of (logical) storage although it is not restricted by any underlying physical storage model 56 Advanced Database System
  • Slide 57
  • Biliography Database Systems: A Practical Approach to Design, Implementation, and Management, fourth edition Thomas Connolly & Carolyn Begg 57 Advanced Database System