data integration in current research information systems integration vs. aggregation maximilian...
TRANSCRIPT
![Page 1: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/1.jpg)
Data Integration in Current Research Information Systems
Integration vs. Aggregation
Maximilian Stempfhuber
GESIS / IZ Social Science Information Centre
Bonn, Germany
euroCRIS IR Workshop, November 9, 2006
![Page 2: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/2.jpg)
Topics
• What users want
• Current information landscape in Germany
• Aggregation vs. Integration: Dealing with heterogeneity
• Model for integrating decentralized and heterogeneous information
• Focus: Semantic level
• Integrating entities
• Coping with sustainability
2
![Page 3: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/3.jpg)
Looking at the Scientific User
• Spend 0,5 days/week searching for information
• Most frequently used information sources
• Journals (73%)
• Internet search engines (71%)
• Books (67%)
• Personal / informal communication (52%)
• Scientific portals / subject gateways (39%)
• Big differences between disciplinesBoekhorst et al. 2003, Poll 2004
Why are Internet search engines preferred to dedicated (research) information systems?
3
![Page 4: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/4.jpg)
What Scientific Users Want
• Specialized portals (deep indexing, integration)
• Interdisciplinary links (cluster search)
• Intelligent integration (all types of information)
• Quality („no waste“, role of search engines?)
• Quantity + relevance (but no information overflow)
• Direct access („now-or-never“, reference + source)
• Communication (invisible colleges)
In line with models from information science
Confirms results from recent surveys
Boekhorst et al. 2003, Poll 2004, IMAC 2002, RSLG 2002, Binder et al. 2001, Stahl et al. 1998, WWW Search Engines: Machill & Welp 2003
4
![Page 5: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/5.jpg)
Demands of Other Types of Users (Examples)
University or State level
• Overview over all scholars / research units
• Overview over all research projects
• Overview over publications, internal/external co- operations, funding received, …
• Input by users, quality assurance by research officers, automatic reporting, benchmarking, visibility of research, data exchange, …
Federal level
• Research administration
• Benchmarking / rating / ranking of instruments, programs, research organizations and disciplines
5
![Page 6: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/6.jpg)
Consequences and difficulties for Building a CRIS
• Different types of information needed (e.g. research units, persons, projects, publications, datasets, co-operations)
• Only parts of the information produced in-house
• Information produced by different groups of people (researchers, administration, funding agency/reviewers)
• Large amounts of data must be externally acquired (e.g. other institutes, publishers, harvesting)
• Data is of different structure and quality, difficult to convert / analyze at a very detailed level; sometimes modification of data not allowed
• Not all data is visible to all users
• Different demands for information access and use
Difficult to convert to a standardized data model 6
![Page 7: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/7.jpg)
Heterogeneity on the information landscape
JK 7
![Page 8: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/8.jpg)
Current Information Landscape in Germany (extract)
National Central libraries National research collection system (SSG)and Virtual Libraries (funding: DFG)
Information networks(funding: BMBF)
Research institutes
8
![Page 9: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/9.jpg)
Building a national CRIS by Aggregation (vascoda.de)
www.vascoda.de 9
![Page 10: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/10.jpg)
Aggregation – Heterogeneity as a Challenge
• Data types
• Indexing languages
• Metadata schemas
• User interfaces
• Technical interfaces
• Natural languages, …
Heterogeneous
10
![Page 11: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/11.jpg)
Features of Aggregation
• Single point of access to information
• Standardized functions applied to all information
• Features reflect least common denominator
• Enforced standards
• Remaining differences ignored (or lead to exclusion)
• Information entities not connected
Meta search (if distributed) Data Warehouse (if centralized)
11
![Page 12: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/12.jpg)
Features of Integration
• Single point of access to information
• Source-specific functions available
• Features reflect different demands
• Enforced standards
• Remaining differences are treated
• Information entities tightly connected
© IBM
Model
12
![Page 13: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/13.jpg)
SOWIPORT – CRIS for the Social Sciences
Thematische Dokumentationen
sowi ReiheSoFid
13
![Page 14: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/14.jpg)
SOWIPORT – Core Content
• GESIS
• Literature references and full text documents
• Projects, institutes, journals, WWW resources, …
• Empirical data
• Partners
• …
• Library catalogues
• Open Access journals
• Topic-specific electronic publications
• Deutsche Forschungsgemeinschaft (DFG)
• National licenses to electronic resources
14
![Page 15: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/15.jpg)
Layer 3
Google Scholar, MS Academic Search, Scirus, …
…Layer 2
Homepages of SOFO-Institutes, Harvesting (Grey literature), …
Theoretical Foundation: Layer Model
Layer 1
Databases, OA Repository (Self archiving), Harvesting (Metadata), Reviews, Wikipedia, …
Core
SOLIS, FORIS, SoLit, CSA, …
SOWIPORT – Information Architecture 1/2
SOWIPORT-Partners
intellectual (CC) Heterogeneity statistical, …
social sciences Content scientificsystematic Content indexing not systematic
15
![Page 16: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/16.jpg)
SOWIPORT – Information Architecture 2/2
Databases Publications
Documentation unit (Service)Publication
16
![Page 17: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/17.jpg)
SOWIPORT – Semantic Integration
DZI
IZ
CSA
Cross-Concordancesbetween Thesauri
Query Transformation
Aktionsforschung
SOLIS
Aktionsforschung
DZI SoLit
Handlungs-forschung
CSA
Action Research
Relevance Ranking
17
![Page 18: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/18.jpg)
Treating Heterogeneity Between Indexing Vocabularies
intellectual
1 : n
a)
statisticalsearch
1 : n
c)
n : m
statistical, parallel corpus
b)
=
18
![Page 19: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/19.jpg)
SOWIPORT – Semantic Integration of Core Content
DZI
IZ
CSA
Cross-concordancesbetween thesauri
• Methodology for terminology mappings
• intellectual• statistical• deductive
• Mapping between vocabularies• bilateral („pure“ model)• central vocabulary (efficency)
Thesauri in SOWIPORT
• IZ (SOLIS, FORIS, WZB OPAC)
• DZI (SoLit)
• DZA (GeroLit)
• SWD (SGG OPAC)
• FES (FES OPAC)
• ASSIA (Applied Social Sciences Index and Abstracts)
• PEI (Physical Education Index)
• WPSA (Worldwide Political Science Abstracts)
• CSA (Soc. Abstr., Soc. Serv. Abstr.)
• MADIERA (Surveys)
• EuroThes (IBLK OPAC)
• FIS Bildung
• APA (Psyndex)
• BiSP (SpoLit, SpoFor, SpoMedia) 19
![Page 20: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/20.jpg)
vascoda: Context for Connecting Disciplines
Pedagogics Psychology
Economics Sports, …
MedicinCross-concordanzesbetweeen 12 thesauri
20
![Page 21: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/21.jpg)
SOWIPORT – Structural, Local Integration (Core)
Partners‘databases(RDBMS, Allegro, …)
sowiport-XML-Schema
DBClear
Services:Terminology service,
Personalization,Authentication, …
Indexing / Retrieval
21
![Page 22: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/22.jpg)
Integrated Search
Self Archiving in SOWIPORT
Pro
duct
Cat
alog
ue
SOLIS
Literature Search
CommunicationHomepages CV
+Publikations
(Self archiving) Full text
Repository
MetadataSOFO
Affiliation
• Initial motivation: WR Evaluation
• Sustainability: Incentives 22
![Page 23: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/23.jpg)
Self Archiving and Evaluation
SOLIS + CSA
DBClear
WR suppliesnames of universitiesand researchers
• Retrieval of publications
• Quality control
Review and additions byresearchers
Evaluation by WR
Perspective:
• Basis for scholars‘ homepages / Who-is-who
• Self archiving in OpenAccess Repository
Transfer of metadata to the SOWIPORT core23
![Page 24: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/24.jpg)
Reflecting Integration at the UI Level
24
![Page 25: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre](https://reader036.vdocument.in/reader036/viewer/2022062321/56649f295503460f94c41cff/html5/thumbnails/25.jpg)
Conclusions
• Integration goes well beyond aggregation• Shift from data orientation to an information use perspective necessary• Challenges
• Deal with heterogeneity at different levels• Integrate primary data with publications, …• Organize information sharing / access / sustainability
• Emerging infrastructures allow for integration• Licensing and access issues are still a problem
Thank You!
Dr. Maximilian Stempfhuber
www.gesis.org/IZ 25