reconnect webinar : research classification in … understand and report on their portfolios, ......
TRANSCRIPT
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
a portfolio company of!
ReConnect Webinar : Research classification in practice 2
March 18, 2014
a portfolio company of!
ÜberResearch helps funding and research institutions better understand and report on their portfolios,
looking internally and comparing globally.
www.uberresearch.com [email protected]
[email protected] [email protected]
Research classification in practice: Stamping or Understanding!
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
About ÜberResearch
UberResearch: @n About us
• Team’s 10 year experience delivering tools and services for funding and research institutions
• Over 20 development partners
• Active member of CASRAI, ORCID, CrossRef / FundRef, TAG
• Example: UberWizard for ORCID
• Portfolio company of Digital Science, the younger sibling of the Nature Publishing group
Experienced, international team
Part of the Digital Science family
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
About ÜberResearch
UberResearch: @ What we provide
Tools and services
• Portfolio analysis and reporting
• Reviewer identification
• Categorization tools and support
• Integrations – systems and content
• Leveraging a growing global award database
• 1.2m grants from 60plus funders
• Covering more than $600,000,000,000 of historic funding
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Today’s contents & goals
1. Recap from webinar 1
2. Sharing experiences with existing classification systems
– Gerry Lawson (NERC) on the RCUK classification
– CADRO – a disease specific classification
3. Making Research Classification operational
– Semantic research classification
– Use cases and document sets
– Implementing and operating research classification systems
– Discussion
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
1. RECAP & QUESTIONS FROM WEBINAR 1
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Different Levels of Classification Systems
Classifications across all of science (e.g. Australian and New Zealand Standard Research Classification (ANZSRC), CASRAI Standard Classification Scheme)
Some%are%discipline)specific))+)Health%Research%Classifica2on%System%(HRCS),%ICD10,%the%NIH’s%Research,%Condi2on,%and%Disease%Categoriza2on%(RCDC)%or%MeSH%used%as%a%classifica2on.%
Some%are%domain)specific)for)narrower)topic)areas)like%Common%Alzheimer's%Disease%Research%Ontology%(CADRO)%for%Alzheimer’s;%Common%Scien2fic%Outline%(CSO)%for%Cancer,%etc.%
Funder)&)organiza8on)specific)and)one)of)the)three)levels)above:)RCUK%Subject%Classifica2ons,%RCDC%from%NIH,%DFG%Disciplinary%Classifica2on%System%%%Even%divisional)and)personal)ones)–%classifica2on%categories%to%plan%and%structure%programs%etc.%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Quality of coding
Costs Applicable to various doc
sets
Ability to mature with
science
Breadth of use cases
Summary ‘How Applied’?
Low)
High)
Semantic / automatic
Manual
Training set
Thesaurus only indexing / automatic
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Key takeaways from webinar 1
Use%case%
Classifica2on%schema%%(own,%exis2ng,%one,%many)%
Applica2on%rou2ne%%(manual,%seman2c)%%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
a portfolio company of!
gtrdocs.hackpad.com%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
CouncilResearch.Subject
Research.Topic
AHRC 14 138BBSRC 9 77EPSRC 24 150ESRC 14 24NERC 8 50STFC 9 43Total 78 482
Numbers%of%Classes%‘origina2ng’%with%each%Council%%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
NERC%Second%Level%Classifica2on%(Topics)%Propor2onal%Awards%for%starts%each%calendar%year%50%NERC%Core%Topics%only%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
NERC%Top%Level%Classifica2on%(Subjects)%Propor2onal%Awards%for%starts%each%calendar%year%Top%30%Subjects%%
IADRP – International Alzheimer’s Disease Research Portfolio • Collaborative effort of the US NIH National Institute on Aging
(NIA) and the Alzheimer’s Association (AA) • To help coordinate and plan Alzheimer’s research
– …enable strategic coordination of efforts among funding agencies but it will also be a tremendous resource for the research community, and the public at large. Source: NIA
• Began in 2010 • Public database available now, and adding funders globally
%
%
IADRP’s classification scheme – “CADRO” • CADRO – Common Alzheimer’s Disease Research Ontology • A unified classification system to enable comparative analysis • Jointly developed by the NIA and AA • Created from a study of NIA and AA funded projects from
FY04-FY10 • Exemplar of a deep / granular scheme
• Seven top-level categories, a three-tiered scheme to capture the complete range of AD research – basic, translational and clinical – and AD research-related resources.
• Manually coded, with one code per project • Coding performed by NIA IADRP, AA staff, and the staff of
the other participating funders with help from the CADRO Coding Guidelines.
Sources:%NIA,%NIA%CADRO%
Source:%IADRP%
An example three-level code, using a topic recently in the news – a possible blood test for AD:
Code: “B.1.b”: • Category B. Diagnosis,
Assessment, and Disease Monitoring
• 1. Fluid Biomarkers • b. Blood Biomarkers
IADRP – classification use cases
A portfolio analysis tool to: • Track changes in the AD research landscape over
time • Identify research gaps and areas of overlap within and
across AD funding agencies • Identify collaborative opportunities aimed at
advancing AD research and alleviating the socioeconomic burden of this devastating disease.
Source:%NIA%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
2. SEMANTIC RESEARCH CLASSIFICATION
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Two non-exclusive options when creating semantic classification 1. Modelling an existing classification with semantic expressions
– E.g. RCDC (done by us), CASRAI classification etc.
– Starting with an existing one is perfect - since a common understanding exists already
– It is just ‘translating’ it to a ‘machine readable’ definition
2. Building a new category schema
– Requires same process leading to common understanding
– Semantic modelling can help to inform the discussion process
– Result is a ‘machine readable’ classification
– More work compared to model an existing system
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Continuum from Search to Categorisation
• Quick overview
• High flexibility
• Challenge of long tail of irrelevant result
• Flat
• ‘Instant’ category
Search% Classifica2on%
• Agreed on definitions
• ‘ordering’ a discipline or field
• Hierarchy
• ‘static’ category
The%difference%is%who%is%involved,%how%it%is%defined%and%how%much%2me%is%spent%on%the%query%/%defini2on.%Not%more.%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
RCDC Example – Modelling a Classification Semantically • RCDC is a classification from the
NIH, using thesauri as a basis for a classification
• Requires normally a text mining technology called Fingerprinting (Collexis)
• ÜberResearch remodelled the RCDC category system to make it applicable to other funders portfolio and datasets
• Complex definitions requiring terms to be above certain thresholds, combination of terms etc.
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
RCDC Example – Modelling a Classification Semantically
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Challenges when designing a semantic category… • Common understanding (internal and external) of the classification term – that
is, do all topic specialists agree which grants should be in/out of this set? Definitions matter.
• Are you expecting a grants to appear in multiple sets? Do you mind ‘double counting’?
– Report on amount of funding on Atherosclerosis and on the statin related research
• 2700 projects on Atherosclerosis, 403 on Statin
• 72 are overlapping. Where should they be assigned? Or can they be double counted?
• Accept that there can never be a definitive answer – there will always be grants where their inclusion or exclusion is a moot point.
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Challenges when designing a semantic category…
False positive
Long tail of less relevant documents
Get all documents which are relevant
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
It is clear that…. Subject Matter Expertise…
• Is still required…
• Semantic categories are just a way to preserve it…
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
3. USE CASES, RESEARCH CLASSIFICATIONS AND DOCUMENT SETS
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Automation allows more content to be tagged…
Grant%applica2ons,%reports%
Patents%
Publica2ons%
Own%grants%
Other%grant%poraolios%
Clinical%trials%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Which document sets for which use cases?
Categorising%poraolio%for%science%policy%level%
Comparing%and%aligning%ac2vi2es%with%other%funders%
Repor2ng%on%research%funding%
Planning%b%define%‘grand%challenges’%
Reviewer%iden2fica2on%
Connec2ng%output%/%evalua2on%data%with%grants%
Gap%analysis%/%trend%analysis%
Report%financial%side%of%research%funding%%
Impact%/%outcome%analysis%
Internal%documents%(applica2ons,%progress%
reports)%Own%awards% Global%awards%
Publica2ons%(General%and%related)%
%Patents% Clinical%trials%
%%%%%%%%
%%%%%%%%
Use)case)Document)
Understanding%own%poraolio%in%detail%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
4. OPERATIONALIZATION RESEARCH CLASSIFICATION
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Content to be tagged…
• Integration of content requires
– Collecting content
– Harmonizing it
– Disambiguating it across data sources (in addition to ORCID)
– Integrating it into one database
– Building the analytical layer
– Building the process support
Grants% Classifica2on%1%
Publica2ons%
Other%grants%
Classifica2on%2%
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Applying research classifications
• Potentially developing a bespoke classification to suit your needs
• Leverage eventually implementation of semantic representations of classifications (RCDC, CASRAI etc.)
• Implementing the routines to automatically apply the categories
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Key takeaways from webinar 1&2
Use%case%
Classifica2on%schema%%(own,%exis2ng,%one,%many)%
Applica2on%rou2ne%%(manual,%seman2c)%%
Content%to%be%integrated%(applica2ons,%grants,%other%funder’s%poraolios,%publica2ons%patents,%trials)%%
Implementa2on%and%opera2on%
!
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Imagine a Babel Fish for Research Classifications…*
• One system
• All content
• All classifications systems with automatic routines to assign them
• Analytical views to get insights in seconds
• Report consistently leveraging the above
• … all other use cases served as well from the same data and same application…
* �Mapping global health research investments, time for new thinking - A Babel Fish for research data��(Terry et.al. 2012)
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Our Babel Fish has to learn a few classifications still, but… • Shared database
• Classification systems already modelled / in process
• Support for ad hoc searches to modelling classifications
• Supports the ability to share them with others
• 22 development partners (mostly funders) helping us to define the features and functions
• Realising savings for all funders by taking care of the basics as a shared solution
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
• Launched together with ORCID in the beginning of February
• Free and open tool to allow researchers adding their grants from many funders in one wizard
• Growing database of 1.2m grants from 60+ funders
• Funders can benefit from it by adding their grants at no cost
• Contribution to drive ORCID adoption and the representation of grants in the ORCID records
Free and open tool…
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
5. DISCUSSION & QUESTIONS
a portfolio company of!
CASRAI ReConnect Webinar: Research classification 2
Thank you! Christian Herzog ([email protected]) Ashlea Higgs ([email protected]) Steve Leicht ([email protected])