0 © wipo – 2003 pf & cjf claims computer-assisted categorisation of patent documents in the...

19
1 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS Project Manager WIPO & Caspar J. Fall, CLAIMS Consultant ELCA ICIC’03, Nîmes, 22 October 2003 CLassification Automated InforMation System

Upload: daisy-rose

Post on 25-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

1 © WIPO – 2003 PF & CJF

CLAIMS

Computer-Assisted Categorisation of Patent Documents in the International

Patent Classification

Patrick Fiévet, CLAIMS Project Manager WIPO& Caspar J. Fall, CLAIMS Consultant ELCA

ICIC’03, Nîmes, 22 October 2003

CLassification Automated InforMation System

Page 2: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

2 © WIPO – 2003 PF & CJF

Agenda

1. Introduction to CLAIMS project (PF)

2. Computer-assisted categorization prototypes (CJF)

3. CLAIMS Categorizer perspectives (PF)

Page 3: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

3 © WIPO – 2003 PF & CJF

1. Introduction to CLAIMS Project

Page 4: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

4 © WIPO – 2003 PF & CJF

1.1 CLAIMS Context

World Intellectual Property Organization (WIPO)

International Patent Classification (IPC)

Classification Automated Information System (CLAIMS)

Page 5: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

5 © WIPO – 2003 PF & CJF

1.2 CLAIMS Project Objectives

• IPC Reform and revision support

• IPC Categorization assistance to Patent Offices

• IPC Tutorials

• Translation and Natural Language Search in the IPC

IT support for the promotion of the IPC

Page 6: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

6 © WIPO – 2003 PF & CJF

2. Computer-assisted Categorization

Page 7: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

7 © WIPO – 2003 PF & CJF

2.1 Objectives

Develop a solution for predicting International Patent Classification (IPC) codesFacilitate accurate classification in small and medium

patent officesSupport for documents in multiple languagesCategorization assistance tool

Open questionsDepth of computer-assisted categorizationWhat accuracy?

Page 8: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

8 © WIPO – 2003 PF & CJF

2.1 Key issues

Survey of automated categorization research

Patent categorizationThe IPC is a hierarchical classification

» 120 classes, 628 subclasses, 69’000 groups» Patents have secondary IPC codes

The categories are modified over timeVocabulary very diverse and technical

Page 9: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

9 © WIPO – 2003 PF & CJF

2.1 Patent categorization approachMachine-learning method to recognize categories

»Statistical distribution of wordsEstablish training data

»Training documents with good IPC codes

»210’000 to 830’000 documents

Disadvantages

• No need for keywords

• Easy to train the tools

• Can support many languages

• Never absolute certainty in the results

• Difficult to have reliable full automation

Advantages

Page 10: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

10 © WIPO – 2003 PF & CJF

2.2 PrototypeCustom development

State-of-the-art algorithmLanguage independent

Measure categorization successCompare the predictions with other manually

classified documents

1

2

3

ic

ic

realguess

1

2

3

ic

ic

realguess

1

2

3

ic

ic

realguess

Top prediction Three guesses All classes

mc mc mc

Page 11: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

11 © WIPO – 2003 PF & CJF

88

.7%

88

.1%

89

.1%

90

.2%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

English French Russian German

Pre

cisi

on

Class Subclass Main group

2.2 Prototype results

Page 12: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

12 © WIPO – 2003 PF & CJF

2.2 Improving accuracy with category refining

A 01

A 61B A 61C

A 61D1 /00 A 61D3 /00 A 61D5 /00

A 61D

A 61 A 62

A

...

B

...

C

...

...

IP C

direct

validate

refine

Scenario 1 Scenario 2

Page 13: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

13 © WIPO – 2003 PF & CJF

2.3 Conclusions

It works well!Useful user assistanceDirect categorization at subclass level possible IPC codes can be refined accurately to main group

level

To get accurate results, one needs:Large datasets Good category coverageAccurate IPC codes

Read the proceedings for more detailsDemonstration available after the presentation

Page 14: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

14 © WIPO – 2003 PF & CJF

3. IPCCAT

Page 15: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

15 © WIPO – 2003 PF & CJF

3.1 CLAIMS Categorizer Perspectives

1. Implementation : IPCCAT

2. Training sets for IPC Categorization: English, French, Spanish and Russian, German possibly chinese

3. IPC Data sets improvement & Categorizer Retraining

Page 16: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

16 © WIPO – 2003 PF & CJF

3.2 CLAIMS Categorizer Perspectives

4. Improve integration of the IPC Categorizer with other CLAIMS tools

5. CLAIMS policy for distribution of data sets in various Languages

Page 17: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

17 © WIPO – 2003 PF & CJF

3.2 Access to IPCCAT for PCT

Login: IBGST01

Password: clobterib

Page 18: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

18 © WIPO – 2003 PF & CJF

Questions / Answers

Patrick Fiévet: [email protected]

Page 19: 0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS

19 © WIPO – 2003 PF & CJF

Thank you for your attention

CLAIMSCLassification Automated InforMation System