Download - 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University
![Page 1: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/1.jpg)
1
Technologies for distributed systems
Andrew Jones
School of Computer Science
Cardiff University
![Page 2: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/2.jpg)
2
Need to bring data together
• To achieve breadth(e.g. coverage of more organisms)
• To achieve depth(e.g. more complete data on individual species)
![Page 3: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/3.jpg)
3
Merging
1.The original databases are physically copied into a new combined database.
2.The user interacts with the new combined database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
1
2
![Page 4: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/4.jpg)
4
Linking
1.Original databases remain separate, but accessed via a single system such as a portal
2.The user interacts with an access system which does not itself contain data. When the user requests data, it is fetched from the appropriate database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
2
1
![Page 5: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/5.jpg)
5
Basic problems to solveHow to deal with data that is:• On various database management systems• Distributed across various machines• Distributed across various machines of various types• Based on various schemata (i.e. not all data expressed in the
same form)
Also, how to resolve data quality problems:– taxonomists vary in their opinions– large taxonomic treatments are generally inconsistent– individual databases generally have mistakes– (So we need tools to help biologists detect and
resolve such problems, such as LITCHI – nottoday’s topic!)
![Page 6: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/6.jpg)
6
Essential elements of solution
• Ways of setting up communication between components
• Ways of expressing data suitably for it to be communicated between components
• Ways of describing and finding components such as data sources
![Page 7: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/7.jpg)
7
Setting up communication
Possibilities include:• CGI (Common Gateway Interface)
– style HTTP requests(A standard for communicating requests to Web servers)
• Z39.50(A standard for digital libraries)
• Web Services• DiGIR
![Page 8: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/8.jpg)
8
CGI-style HTTP requests
• Simple way of passing parameters
• in one variant (GET), parameters expressed as part of the URL, e.g.
http://www.ildis.org/LegumeWeb?genus~Sabinea&species~florida
(NB: POST preferred)
• Result: an HTML page (see next slide)
![Page 9: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/9.jpg)
9
![Page 10: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/10.jpg)
10
“CGI” approach: strengths & limitations
• Easy to set up• Not good for complex data• HTML is basically a
formatting language, for sayinghow documents should be displayed, not what they contain
• But we can pass around XML too– E.g. SPICE– Also HTTP is the basis of SOAP (see later)
![Page 11: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/11.jpg)
11
SPICE
• Species 2000 Interoperability Cooperation Environment
• Allows choice between– HTTP GET/XML response (essentially the CGI
approach, but retrieving XML)– CORBA
• Uses wrappers to transform to common data model & SPICE protocols
![Page 12: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/12.jpg)
12
SPICE architectureUser
(Web browser)User
(Web browser)..........
Wrapper(e.g. JDBC)
Wrapper(e.g. CGI)
..........
GSD GSD
CAS(Common Access System)
User server module (HTTP)
CAS knowledge repository
‘Query’ co-ordinator
CORBA
(In some cases, generic)
CORBA ‘wrapping’ element of GSD
wrapper
![Page 13: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/13.jpg)
13
Z39.50
• A standard for digital libraries
• (Most library systems are built around this standard)
• For interoperability in client-server architectures
• Standardised sets of attributes (items of data)
![Page 14: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/14.jpg)
14
Z39.50 strengths & limitations
• Standard for digital libraries
• Works well for certain widespread, agreed data standards (‘profiles’)
• Very restrictive if you want to add on things like extra security
• Useless in cases where a data standard doesn’t yet exist
![Page 15: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/15.jpg)
15
Web Services• Web services
provide a simple way of making software available on the Internet.
• All the communications in this diagram are SOAP messages
Service
Provider
Service
Consumer
Service directory
(e.g. UDDI)
Reg
iste
r ser
vice
des
crip
tion
(WSD
L)
Query responses (W
SDL)
Directory query
XML service request, based on WSDL
XML service response, based on WSDL
![Page 16: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/16.jpg)
16
![Page 17: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/17.jpg)
17
![Page 18: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/18.jpg)
18
![Page 19: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/19.jpg)
19
DiGIR
• Proprietary approach, especially designed for specimen records
• Uses Darwin Core data model
• The following slide is the DiGIR team’s high-level architecture diagram ...
![Page 20: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/20.jpg)
![Page 21: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/21.jpg)
21
XML (eXtensible Mark-up Language)
• Flexible mark-up language• Like HTML, but tags describe the
document’s contents, not how it’s to be displayed.
• XML is the basis of SOAP: ‘language independent’, i.e. a good data interchange format.
![Page 22: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/22.jpg)
22
<!DOCTYPE SPECIESLIST[<!ELEMENT SPECIESLIST (SPECIES*)><!ELEMENT SPECIES (GENUS, EPITHET, AUTHORITY?)><!ELEMENT GENUS (#PCDATA)><!ELEMENT EPITHET (#PCDATA)><!ELEMENT AUTHORITY (#PCDATA)>]>
<SPECIESLIST> <SPECIES> <GENUS>Vicia</GENUS> <EPITHET>Faba</EPITHET> </SPECIES> <SPECIES> <GENUS>Sabinea</GENUS> <EPITHET>punicea</EPITHET> <AUTHORITY>Urban</AUTHORITY> </SPECIES></SPECIESLIST>
Simplified Species 2000 example
![Page 23: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/23.jpg)
23
Dimensions of interoperability
• System
• Syntactic
• Structural
• Semantic
![Page 24: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/24.jpg)
24
Syntactic interoperability – some problems
• Differences in machine-readable aspects of data representation (formatting), e.g.
<species>
<genus>Vicia</genus>
<epithet>faba</epithet>
</species>
<species>
<genus>Faba</genus>
<epithet>faba</epithet>
</species>
…
Genus Epithet
Vicia Faba
Faba Faba
…
![Page 25: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/25.jpg)
25
Syntactic interoperability – some solutions
• Typically fairly easy to write converters between formats
• “Wizards” (if we’re going to do data preparation first)
• XSLT (transforming between XML documents holding same information in different formats)
![Page 26: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/26.jpg)
26
Structural interoperability – some problems
• Representational heterogeneity that involves data modelling constructs
• Schematic heterogeneity
• For example …
![Page 27: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/27.jpg)
27
Structural heterogeneity example
Id Name
: : :
25 Vicia faba
26 Faba faba
: : :
Id Genus
Epithet
: : :
25 42 9
26 44 9
: : :
Id GenusName
: :
42 Vicia
43 Abrus
44 Faba
: :
Id EpithetName
: :
8 vulgaris
9 faba
: :
Database 1
Database 2
![Page 28: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/28.jpg)
28
Structural interoperability – some solutions
• Database views
• XSLT (to some extent)
• Metadata & ontologies (associate terms in data sources with those in a shared vocabulary)
• “Wrapping” to map between heterogeneous data sources and a shared representation (common data model)
![Page 29: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/29.jpg)
29
Semantic interoperability – some problems
• Specimen distribution data example– Database A holds data for Vicia faba– Database B holds data for Faba faba
• Descriptive data example– Database A: leaf length varies from 25.4 to
76.2 mm– Database B: average leaf length 2 in
![Page 30: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/30.jpg)
30
Semantic interoperability – some solutions
• It’s not an entirely solved problem!• Useful general techniques:
– Use of ontologies (defining relationships between terms, e.g. units)
– Mapping functions– Attached metadata– …
• Domain-specific techniques– “Synonymy server”– LITCHI (as an integration tool)– …
![Page 31: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/31.jpg)
31
Ontologies
• Agreed terminology
• Relationships between terms
• Example use: integrator can associate terms in a source database schema with those in an agreed federation schema
![Page 32: 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e975503460f94b9a748/html5/thumbnails/32.jpg)
32
Summary• Interoperation among distributed
resources is essential for ‘added value’• Techniques exist for dealing with
– communication between heterogeneous systems (e.g. Web Services; wrapping)
– communication between systems with heterogeneous data (e.g. ontologies)
• But not all the problems aresolved!