linked inventor biography data - a new method of …...how to kill inventors: testing the...

15
Linked Inventor Biography Data - A new method of inventor disambiguation ORCID-OECD-Crossref Workshop on Identifiers and Intellectual Property Paris, June 22nd 2017 Matthias Dorner Max-Planck Institute for Innovation and Competition (MPI-IC) and Institute for Employment Research (IAB)

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

Max Planck Institute for Innovation and Competition | Munich

Linked Inventor Biography Data -A new method of inventor disambiguation

ORCID-OECD-Crossref Workshop on Identifiers and Intellectual Property

Paris, June 22nd 2017

Matthias Dorner Max-Planck Institute for Innovation and Competition (MPI-IC)

andInstitute for Employment Research (IAB)

Page 2: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

2Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

§ Patent data are the main data source for economic analyses of innovation and IP (Griliches 1990)

§ No unique inventor id across patents è “who-is-who” problem

§ Quality of disambiguation is crucial for quality of any research and policy advice§ Naive name disambiguations yield overinflated patent portfolios due to common names

§ More adequate disambiguation approaches consider additional features

1. “Internal”- Only (internal) information from patent register data

- Deterministic/ probalisitic assignment rules to group patent-inventor records into unique persons(see e.g., Trajtenberg et al. 2006, Raffo & Lhuillery 2009, Li et al. 2014, Pezzoni et al. 2014, Ventura et al. 2015, Morrison et al. 2017)

2. “External”

Motivation

Page 3: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

3Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

§ Motivation (see Dorner et al. 2014)• Big data approach: use structure of external data and record linkage for disambiguation

• Merge additional variables from external data (è research)

§ Requirements• Biographical data with unique person id, name and address information

• Data must be regularly updated and include time stamps

• Data availability (open vs. confidential data) and coverage (subpopulation vs. global)

Ø Administrative labor market data collected within the German social security system

§ Two step approach• (1) Identification: Identify subset of inventors in the external data

• (2) Grouping: Completion of inventor biographies for the matched inventors

“External” inventor disambiguation

Page 4: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

4Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

Patstat: Inventors (ambiguous)i) listed on EP Patents 1999-2011 ii) residential address in Germany

IAB: Full population of employees in Germany (subject to social security 1999-2011)

Disambiguated inventors/employees (I)COMPLETE employment biography data (1980-2015), INCOMPLETE patent biography data (only 1999-2011)

Patstat: Full population of patent-inventor-assignee records in Germany

(1980-2015)

Disambiguated inventors/employees (II)COMPLETE inventor biography data covering patents and

employment 1980-2015

(2) Grouping step

(1) Identification step

Page 5: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

5Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

Data sources

+more detailed address information frominventor designation files ofEPpatents

Variables§ Social security ID(=persistentidentifer)

§ Lastname§ Firstname§ Residental address§ Firmaddress§ Yearofemployment§ OthersEducation,employment status,occupation,wage,establishment (id,size,industry,...)

+more detailed addressinformation from assignee file of

EPpatents

1 2 3 4 5 6 7 8 9P u t z ke rR e n eH a u p t s t r a ß e 1 2

4 2 5 7 9D E

0 1 0 1 2 0 1 0 3 1 1 2 2 0 1 0 6 6 6 6 6 6

1 2 3 4 5 6 7 8 9

H e i l i g e n h a u s

(A) (B)External data

Page 6: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

6Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

Identification step

A- Inventor-patentsrecords 1999-2011

FirstnameLastnameStreet

HousenumberCity

ZipCodeYear(patentapplication)

B- Employees 1999-2011

FirstnameLastnameStreet

HousenumberCity

ZipCodeYear(employment episode)

Pairwiserecord linkageProbabilistic string

matching,blocking by year

Page 7: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

7Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

1975 1999 2011…

2015

E1

E3

E4

E4

E4

E5E5

E5

P1 P2 P3 P4 P5

E4

E4E4

E5

E3

1994…

Identifcation step

Legend: Employment spellof inventor I in establishment E

Page 8: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

8Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

Grouping step (1)

Pairwise recordlinkage

Probabilistic stringmatching,blocking by

year

BInventor-patentrecordsprior to 1999and after2011

FirstnameLastname

Application yearMunicipality code workplace(=geocoded assignee address)Municipality code residence(=geocoded inventor address)Technologyarea ofpatent

(Schmoch 2008)

A2Biography after2011FirstnameLastname

YearofemploymentMunicipality code workplace (IEB)Municipality code residence (IEB)

Industry code (current establishment)

A1Biography prior to 1999FirstnameLastname

YearofemploymentMunicipality code workplace (IEB)

Industry code (current establishment)

Page 9: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

9Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

§ Classification features• Name similarity score (previous slide)

• Geographical proximity (residence/workplace) increases likelihood of belonging to the same entityè municipality distance matrices

• Technology profile of industries è industry-technology concordances (Dorner & Harhoff 2017)

• Common names (based on administrative labor market data of IAB)

• Additional features to be considered in the future: e.g., shared co-inventors/co-assignees

§ Supervised classification• Extract and label training data

• Classify full set of records based on labeled training data (currently: logistic regression model)

Ø Out-of-sample prediction of record pairs based on previously labeled data, disregard false positive records

Ø Result: Complete inventor biography (= list of patents per disambiguated inventor)

Grouping step (2)

Page 10: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

10Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

Supervised classification – A training example

KIRSTENKUHLE

KRISTINKUHLE

Page 11: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

11Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

P6 P71975 1999 2011…

2015P1 P2 P3 P4 P5

E4

E4E4

E5

E3

1994…

E1

E3

E4

E4

E4

E5E5

E5

Identifcation step Grouping step

Page 12: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

12Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

Biographical information(è similar to IAB but with lower timer resolution;

international coverage is an advantage)

Unique person id and name(s)(è generate mutiple accounts per person)

CVs recorded in social/ career networks(è use APIs, SiSOB crawling (Geuna et al. 2015)

Discussion: Transferability to ORCID

Further information on emplyoment context(è use concordances to match keywords with

scientific fields or industries)

Page 13: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

13Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

§ Disambiguation is essential for micro economic research focussing on inventors• Internal vs. external disambiguation

• External approach identifies unique inventors based on structured external data

Ø Approach could be adapted to ORCID id or other persistent identifiers

§ Limitations• Subject to availability and quality/precision of external data

• Only feasible for subpopulation of disambiguated inventors (vs. global scale)

§ Further steps in our project• Finalize data generation and quality assessment of disambiguation

• Release anonymized research data set via IAB-FDZ and update data at a regular basis

• Research: topics include e.g., labor mobility of inventors and productivity, knowledge spillovers in inventorteams and within firms.

Summary and outlook

Page 14: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

14Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich

§ Dorner, M., Bender, S., Harhoff, D., Hoisl, K., P. Scioch (2014). The MPI-IC-IAB-Inventor data 2002 (MIID 2002): Record-linkage of patent register datawith labor market biography data of the IAB. FDZ-Methodenreport 06/2014. Nürnberg: IAB.

§ Dorner, M., D. Harhoff (2017). A novel technology-industry concordance based on linked inventor-establishment data. Unpublished working paper. Research Policy - revise and resubmit.

§ Geuna, A., Kataishi, R., Toselli, M., Guzmán, E., Lawson, C., Fernandez-Zubieta, A. , B. Barros (2015): SiSOB data extraction and codification: A tool to analyze scientific careers. Research Policy, 44(9), 1645-1658.

§ Griliches, Z. (1990). Patents as Economic Indicators: A Survey. Journal of Economic Literature, 28(4), 1661-1707.

§ Li, G.-C. et al. (2014). Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010). Research Policy, 43(6), 941–955.

§ Morrison, G., Riccaboni, M., F. Pammolli (2017). Disambiguation of patent inventors and assignees using high-resolution geolocation data. Nature –Scientific Data, 4:170064, DOI: 10.1038/sdata.2017.64.

§ Pezzoni, M., Lissoni, F., G. Tarasconi, (2014). How to kill inventors: testing the Massacrator algorithm for inventor disambiguation. Scientometrics, 101(1), 477–504.

§ Raffo J., S. Lhuillery (2009). How to play the ‘names game’: Patent retrieval comparing different heuristics. Research Policy, 38(10), 1617–1627.

§ Schmoch, U. (2008). Concept of a Technology Classification for Country Comparisons. Final Report to the World Intellectual Property Office (WIPO), Karlsruhe: Fraunhofer ISI.

§ Trajtenberg, M., G. Shiff, R. Melamed (2006). The Names Game: Harnessing Inventors’ Patent Data for Economic Research. NBER Working Paper No. 12479. Cambridge/MA.

§ Ventura S., Nugent R., E. Fuchs (2015). Seeing the non-stars:(some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records. Research Policy, 44(9), 1672–1701.

References

Page 15: Linked Inventor Biography Data - A new method of …...How to kill inventors: testing the Massacratoralgorithm for inventor disambiguation. Scientometrics, 101(1), 477–504. RaffoJ.,

15Linked Inventor Biography Data - A new method of inventor disambiguation

Max Planck Institute for Innovation and Competition, Munich

Thank you!

Matthias DornerMax-Planck Institute for Innovation and Competition (MPI-IC)

and Institute for Employment Research (IAB)[email protected]