2012 12-12 research analytics - idea
DESCRIPTION
Just an idea I have about Research Analytics. So the Dutch government would like to see performance indicators by the Higher Education institutions. This information can be drawn from many silo's and just give the indicator. However, when the information from these silo's can be combined, more advanced analytic's can be drawn from this information by data and text mining techniques. But the first step is to integrate the information in a scalable data infrastructure. Research information is not big data, but it is wise to think about scalability to overcome performance issues in the future. Multiple sources provide the data with a daily update, such as repositories'metadata, web log files, altmetrics data, citation references, funding and grand information, etc. The information is calculated and implicit relations are made explicit. In he end it is provided as open data invarious formats, including RESTful API's. Services can draw information from these API's to base their services on. Examples are metrics and analytics services, but also resolution and information portals.TRANSCRIPT
CC-BY: Maurice Vanderfeesten
Research Intelligence
CC-BY: Maurice Vanderfeesten
Research policy making
CC-BY: Maurice Vanderfeesten
based on
Research Impact
CC-BY: Maurice Vanderfeesten
Research Analytics Infrastructure
building an Open Data middleware infrastructure for research information
CC-BY: Maurice Vanderfeesten
wisdom
intelligence
analytics
metrics
data
sources
What data do we have?
What can we measure?
What can we calculate?
What do the calculations mean?
What should be done with the meaning outcomes? Making policy
Impact interpretation
HE performance indicators
Impact factors
Calculations, Correllations & Comparisson
Accumulations, Citations, Downloads, Mentions, Bookmarks, etc.
Publications, Datasets, Tweets, Projects, People, Grants, etc.
Repositories, Mendeley, CRIS’s, Blogs, NOD, Facebook, SciVerse, Twitter, Web of Science What sources are out there?
CC-BY: Maurice Vanderfeesten
Problem statementCurrent research policy is made based on a homogeneous-mono-metrics (eenzijdige metrics), from a single source.Yet making impact spans a much wider spectrum.
CC-BY: Maurice Vanderfeesten
Current situation: established
CC-BY: Maurice Vanderfeesten
wisdom
intelligence
analytics
metrics
data
sources
Citations
A-journalpeer reviewed
Publications
Thompson-Reuters ISIWeb of Science
H-indexLeiden index
Shanghai index
Making policy
Impact interpretation
HE performance indicators
Impact factors
What data do we have?
What can we measure?
What can we calculate?
What do the calculations mean?
What should be done with the meaning outcomes?
What sources are out there?
CC-BY: Maurice Vanderfeesten
Current situation: experimental
CC-BY: Maurice Vanderfeesten
wisdom
intelligence
analytics
metrics
data
sources
Downloads
Open AccessPublications
Repositories
Open AccessPopularity
index?
Making policy
Impact interpretation
Not yet, HE performance indicators
Impact factors
What data do we have?
What can we measure?
What can we calculate?
What do the calculations mean?
What should be done with the meaning outcomes?
What sources are out there?
CC-BY: Maurice Vanderfeesten
wisdom
intelligence
analytics
metrics
data
sources
Mentions /Altmetrics
Tweets, Bookmarks
Twitter, Mendeley
Like-abilityindex?
Making policy
Impact interpretation
Not yet, HE performance indicators
Impact factors
What data do we have?
What can we measure?
What can we calculate?
What do the calculations mean?
What should be done with the meaning outcomes?
What sources are out there?
CC-BY: Maurice Vanderfeesten
Business case: only based on user driven demand
CC-BY: Maurice Vanderfeesten
wisdom
intelligence
analytics
metrics
data
sources
Policy makers must answer:
Impact interpretors must answer:
HE performance indicators
Impact factors
Calculations, Correllations & Comparisson
Accumulations, Citations, Downloads, Mentions, Bookmarks, etc.
Publications, Datasets, Tweets, Projects, People, Grants, etc.
Repositories, Mendeley, CRIS’s, Blogs, NOD, Facebook, SciVerse, Twitter, Web of Science
Only the HE communitycan make a Business case
SURF can orchestratethe ICT infrastructure
and licences
What should the calculations mean?
What could be done with the meaning outcomes?
CC-BY: Maurice Vanderfeesten
Full Spectrum Analyticsposition policy making on a holistic perspective
CC-BY: Maurice Vanderfeesten
Imagine…
• A single entry where you can find the research information you need
• Where you can combine research information to build rich analytics
• Get’s updated daily from different sources of information
• Scalable and performing fast for tons of services to draw from
CC-BY: Maurice Vanderfeesten
Community CloudScalables Research Information storage and restructuring from and by the Dutch Higher Education sector
CC-BY: Maurice Vanderfeesten
Sources Services
licences
Users/Partners
distributed scalable database (community cloud)
CBS
CWTS
Researchers
ministry
Libraries
Data typesResearch Analytics middleware
(Information Brokerage and Restructuring)
Holding:
Proven Open Technology:
Technology Partners:
institutions
Research Partners:
CC-BY: Maurice Vanderfeesten
Business Case: Performance Indicators for Dutch Higher Education
ministryHE institutions
CC-BY: Maurice Vanderfeesten
Current solutionsspecific use case centric
CC-BY: Maurice Vanderfeesten
repositories
Sources Services Users
metadataPublications
[PID][metadata]
[DAI]
Authors[DAI][PID]
NARCISResearchers
Data types Data warehouses
Resolution[PID][URL]
Resolution
metadata
Repositorymetrics
web statistics
web usage[http event]
[PID][GEO]
Libraries
CC-BY: Maurice Vanderfeesten
repositories
Sources Services
web statistics
altmetrics
citation
Users
publishers
Web of science
web statistics
Mendeley
Web usage[http event]
[PID][GEO]
Alt table[social media event]
[PID]
Citation table[citation reference]
[PID]
CWTS
Data types Data warehouses
Web usage[http event]
[PID][GEO]
Researchers
Libraries
LibrariesRepository metrics
Subscription metrics
Social metrics
Citation metricsinstitutions
CC-BY: Maurice Vanderfeesten
Questions
• All these sources, why does every service need to build a separate datawarehouse?• What are the posibilities if we could combine all the information
needed for different services, and put it in one community cloud?
CC-BY: Maurice Vanderfeesten
From Metrics to AnalyticsIntegrating sources of information, making trends across sources visible
AnalyticsMetrics
CC-BY: Maurice Vanderfeesten
repositories
Sources Services
metadata
web statistics
altmetrics
citation
Users
publishers
scopus
Web of science
web statistics
Mendeley
Usage table[http event]
[PID][GEO]
Alt table[social media event]
[PID]
Citation table[citation reference]
[PID]
MetricsCBS
CWTS
Researchers
OCW
Library
Data types Data warehouse
institutions
CC-BY: Maurice Vanderfeesten
repositories
Sources Services
web statistics
altmetrics
citation
Users
publishers
scopus
Web of science
web statistics
Mendeley
Usage table[http event]
[PID][GEO]
Alt table[social media event]
[PID]
Citation table[citation reference]
[PID]
CBS
CWTS
Researchers
OCW
Library
Data types Data warehouse
Analytics
institutions
CC-BY: Maurice Vanderfeesten
Adding sources, adding servicesPotential for a high performing community cloud, as a middelware solution for
shared services in the Research Information and Research Analytics domain.
CC-BY: Maurice Vanderfeesten
Community Cloud as middleware for
Research Information Services
Resolution NARCIS Analytics Repositorymetrics HBO kennisbank
CC-BY: Maurice Vanderfeesten
repositories
Sources Services
metadata
web statistics
web statistics
altmetrics
citation
Users
ISNI/DAI
publishers
scopus
Web of science
distributed scalable database (community cloud)
Mendeley
OCLC
CRIS’ CERIF
Publications[PID]
[metadata][DAI]
Authors[DAI]
[ISNI/ ORCID][PID]
Usage events[http event]
[PID][GEO]
Resolution entries[PID][URL]
Altmetrics[social media event]
[PID]
Citation references[citation reference]
[PID]
Projects[ProjectID]
[PID]
Etc… scalable[…][…]
Resolution
NARCIS
Analytics
Repositorymetrics
HBO kennisbank
CBS
CWTS
Researchers
OCW
Library
Data typesResearch Analytics middleware
(Information Brokerage and Restructuring)
institutions
data mesh
CC-BY: Maurice Vanderfeesten
Use Cases
CC-BY: Maurice Vanderfeesten
repositories
SourcesResearch Analytics middleware
(Information Brokerage and Restructuring) Services
metadata
web statistics
web statistics
altmetrics
citation
n! Faculties
Users
ISNI/DAI
publishers
scopus
Web of science
distributed scalable cloud database
Mendeley
OCLC
CRIS’ CERIF
Publications table[PID]
[metadata][DAI]
Authors table[DAI]
[ISNI/ ORCID][PID]
Usage table[http event]
[PID][GEO]
Resolution table[PID][URL]
Alt table[social media event]
[PID]
Citation table[citation reference]
[PID]
Projects table[ProjectID]
[PID]
Etc… table[…][…]
Resolution
NARCIS
Analytics
Repositorymetrics
HBO kennisbank
CBS
CWTS
Researchers
OCW
Library
Data types
data mesh
1
2
3
45
Use Case: Library wants to see the usage of its
repository, compared to other repositories.
CC-BY: Maurice Vanderfeesten
repositories
Sources Services
metadata
web statistics
web statistics
altmetrics
citation
n! Faculties
Users
ISNI/DAI
publishers
scopus
Web of science
distributed scalable cloud database
Mendeley
OCLC
CRIS’ CERIF
Publications table[PID]
[metadata][DAI]
Authors table[DAI]
[ISNI/ ORCID][PID]
Usage table[http event]
[PID][GEO]
Resolution table[PID][URL]
Alt table[social media event]
[PID]
Citation table[citation reference]
[PID]
Projects table[ProjectID]
[PID]
Etc… table[…][…]
Resolution
NARCIS
Analytics
Repositorymetrics
HBO kennisbank
CBS
CWTS
Researchers
OCW
Library
Data types
data mesh
Research Analytics middleware(Information Brokerage and Restructuring)
1
2
3
3
45
54
Use Case: CWTS wants to make the Leiden index using and combining
information from a variety of sources.
CC-BY: Maurice Vanderfeesten
repositories
Sources Services
metadata
web statistics
web statistics
altmetrics
citation
n! Faculties
Users
ISNI/DAI
publishers
scopus
Web of science
distributed scalable cloud database
Mendeley
OCLC
CRIS’ CERIF
Publications table[PID]
[metadata][DAI]
Authors table[DAI]
[ISNI/ ORCID][PID]
Usage table[http event]
[PID][GEO]
Resolution table[PID][URL]
Alt table[social media event]
[PID]
Citation table[citation reference]
[PID]
Projects table[ProjectID]
[PID]
Etc… table[…][…]
Resolution
NARCIS
Analytics
Repositorymetrics
HBO kennisbank
CBS
CWTS
Researchers
OCW
Library
Data types
data mesh
Research Analytics middleware(Information Brokerage and Restructuring)
1
23
4
4
5
5
Use Case: Researcher wants view the download ratio between his Open Acces
and toll-gated publications, and comparing this to his
co-authors’ratio.
CC-BY: Maurice Vanderfeesten
What does the user see?
CC-BY: Maurice Vanderfeesten
Research Analytics Dashboards