0 © wipo – 2003 pf & cjf claims computer-assisted categorisation of patent documents in the...
TRANSCRIPT
1 © WIPO – 2003 PF & CJF
CLAIMS
Computer-Assisted Categorisation of Patent Documents in the International
Patent Classification
Patrick Fiévet, CLAIMS Project Manager WIPO& Caspar J. Fall, CLAIMS Consultant ELCA
ICIC’03, Nîmes, 22 October 2003
CLassification Automated InforMation System
2 © WIPO – 2003 PF & CJF
Agenda
1. Introduction to CLAIMS project (PF)
2. Computer-assisted categorization prototypes (CJF)
3. CLAIMS Categorizer perspectives (PF)
3 © WIPO – 2003 PF & CJF
1. Introduction to CLAIMS Project
4 © WIPO – 2003 PF & CJF
1.1 CLAIMS Context
World Intellectual Property Organization (WIPO)
International Patent Classification (IPC)
Classification Automated Information System (CLAIMS)
5 © WIPO – 2003 PF & CJF
1.2 CLAIMS Project Objectives
• IPC Reform and revision support
• IPC Categorization assistance to Patent Offices
• IPC Tutorials
• Translation and Natural Language Search in the IPC
IT support for the promotion of the IPC
6 © WIPO – 2003 PF & CJF
2. Computer-assisted Categorization
7 © WIPO – 2003 PF & CJF
2.1 Objectives
Develop a solution for predicting International Patent Classification (IPC) codesFacilitate accurate classification in small and medium
patent officesSupport for documents in multiple languagesCategorization assistance tool
Open questionsDepth of computer-assisted categorizationWhat accuracy?
8 © WIPO – 2003 PF & CJF
2.1 Key issues
Survey of automated categorization research
Patent categorizationThe IPC is a hierarchical classification
» 120 classes, 628 subclasses, 69’000 groups» Patents have secondary IPC codes
The categories are modified over timeVocabulary very diverse and technical
9 © WIPO – 2003 PF & CJF
2.1 Patent categorization approachMachine-learning method to recognize categories
»Statistical distribution of wordsEstablish training data
»Training documents with good IPC codes
»210’000 to 830’000 documents
Disadvantages
• No need for keywords
• Easy to train the tools
• Can support many languages
• Never absolute certainty in the results
• Difficult to have reliable full automation
Advantages
10 © WIPO – 2003 PF & CJF
2.2 PrototypeCustom development
State-of-the-art algorithmLanguage independent
Measure categorization successCompare the predictions with other manually
classified documents
1
2
3
ic
ic
realguess
1
2
3
ic
ic
realguess
1
2
3
ic
ic
realguess
Top prediction Three guesses All classes
mc mc mc
11 © WIPO – 2003 PF & CJF
88
.7%
88
.1%
89
.1%
90
.2%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
English French Russian German
Pre
cisi
on
Class Subclass Main group
2.2 Prototype results
12 © WIPO – 2003 PF & CJF
2.2 Improving accuracy with category refining
A 01
A 61B A 61C
A 61D1 /00 A 61D3 /00 A 61D5 /00
A 61D
A 61 A 62
A
...
B
...
C
...
...
IP C
direct
validate
refine
Scenario 1 Scenario 2
13 © WIPO – 2003 PF & CJF
2.3 Conclusions
It works well!Useful user assistanceDirect categorization at subclass level possible IPC codes can be refined accurately to main group
level
To get accurate results, one needs:Large datasets Good category coverageAccurate IPC codes
Read the proceedings for more detailsDemonstration available after the presentation
14 © WIPO – 2003 PF & CJF
3. IPCCAT
15 © WIPO – 2003 PF & CJF
3.1 CLAIMS Categorizer Perspectives
1. Implementation : IPCCAT
2. Training sets for IPC Categorization: English, French, Spanish and Russian, German possibly chinese
3. IPC Data sets improvement & Categorizer Retraining
16 © WIPO – 2003 PF & CJF
3.2 CLAIMS Categorizer Perspectives
4. Improve integration of the IPC Categorizer with other CLAIMS tools
5. CLAIMS policy for distribution of data sets in various Languages
17 © WIPO – 2003 PF & CJF
3.2 Access to IPCCAT for PCT
Login: IBGST01
Password: clobterib
19 © WIPO – 2003 PF & CJF
Thank you for your attention
CLAIMSCLassification Automated InforMation System