lina ahlin and olof ejermo [email protected] [email protected]
DESCRIPTION
Swedish inventors ‐ matching to registers and descriptive data Presentation at APE-INV Brussels September 5 th 2011. Lina Ahlin and Olof Ejermo [email protected] [email protected]. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/1.jpg)
C I R C L ECentre for Innovation, Research and Competence in the Learning Economy
L U N D U N I V E R S I T YP.O. Box 117, SE-221 00 Lund, Sweden
Swedish inventors matching to registers and‐descriptive data
Presentation at APE-INVBrussels September 5th 2011
Lina Ahlin and Olof [email protected]
![Page 2: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/2.jpg)
On the agenda
• What is so special with Swedish data• 1st matching • 2nd matching • Future – how to reach 100% match rate?• (Results)
![Page 3: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/3.jpg)
Linking inventors to registers
• EPO applied patents 1978-2009 for inventors with addresses in Sweden.
• Matching done on name-home address combinations
• Problem 1: different inventors may have the same name
• Problem 2: addresses may be old• How to verify person identity and connect to
Swedish register data?
![Page 4: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/4.jpg)
Swedish dataQ: What makes Swedish data so exciting (and why we want a high match rate)?A: Through Statistics Sweden it is possible to connect individuals to register data which connects several levels of information relevant for innovation studies:• Individual level: field/level of education, age, income, gender,
workplace• Regions: workplace, home municipality• Sectoral level: sectors, firm size, level of R&D...
can give a multifacetted view of innovation, but need a personal identifier ”personnummer” to do this
e.g. 19500131-3422
Birth date Jan 31st, 1950 Even number = female
![Page 5: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/5.jpg)
1st matching (Oct-Dec 2010)• All Swedes (incl. Personnummer) listed on address register ”SPAR” • Matching of addresses through InfoTorg stores addresses/address changes
latest 3 years addition of personnummer– Individuals under 16 not matched
• Old patents added under the assumption that:Sven Ivar Johanson Sven Ivar JohansonStorgatan 1 = Storgatan 1111 00 Stockholm 111 00 Stockholm
Match rate 64% of inventor-patent pairs. Low peak 23% in 1978 to high peak 93% in 2008. This is because of mobility of inventors.
Register 2008-2010 Patent applied for in 1992
![Page 6: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/6.jpg)
• InfoTorg returned 56% match rate• Manual check (visual – no robot) + 8%
![Page 7: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/7.jpg)
64% match rate
19781980
19821984
19861988
19901992
19941996
19982000
20022004
20062008
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Fractions 64%
1985-2005: present access to individual registers at Statistics Sweden 2006-2009: additions as of Sep. 30th 2011
![Page 8: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/8.jpg)
2nd matching (April-Sep 2011)
• Use public access to registers (Swedish geneaological association )– CD:s of Swedish population (1980)/1990
published by old addresses and birth date– CD ”Book of dead” 1901-2009 address at death
+ personnummer• Match birth date + name to personnummer
using service by InfoTorg or online sources
![Page 9: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/9.jpg)
Methodology
• Extract data from Swedish deadbook and Swedish genealogy records for 1990 (to some extent also 1980) on all individuals in the population by letter
• Generate a variable containing name, address and postal address for all individuals in the population as well as for inventors who are not fully matched
![Page 10: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/10.jpg)
Normalized Levenshtein (”strgroup”) in STATA
• An example of the ”name-address” string:”Sven Ivar Johanson, Storgatan 1, 111 00
Stockholm” (from EPO)= ”Sven Ifwar Johanson, Storgatan 1, 111 00
Stockholm” (from Swedish population 1990) • Replace/insert 3 letters to make strings equal• Divided by length of shortest string (48)
(3/48) = 0.0625 (=a good hit)
![Page 11: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/11.jpg)
Adding date of birth
1. 1990 Levensthein names & adresses2. 1990 Levensthein unique names 3. Levenshtein from CD dead 1901-2009 - names
and adresses 4. Strgroup: similarity on name-address hits 1-35. Some manual additions and minor changes 6. 1980 Levenshtein names and addresses (letters
D&H)
![Page 12: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/12.jpg)
Methodology: continued
• Manually examine each match to see whether Levenshtein-command has matched correctly
• Some hits discarded incl ambiguous name match hits
![Page 13: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/13.jpg)
New match rate 80%
19781979
19801981
19821983
19841985
19861987
19881989
19901991
19921993
19941995
19961997
19981999
20002001
20022003
20042005
20062007
20082009
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Fractions 64%Fractions 80%
![Page 14: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/14.jpg)
Adding personnummer (ongoing)New match rate 80%, but not full personnummer. What to do?1. Use date of birth-part of personal number for fully matched
inventors2. Join all possible combinations of birth dates for those fully
matched and those with only birth dates.3. Run Levenshtein-distance on inventor names4. Small Levenshtein-distance: accept that the inventors are the
same since name and birth date match5. Large Levenshtein-distance: reject6. Further, manually check remaining inventors. Look at
addresses for further confirmation if uncertain.
![Page 15: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/15.jpg)
Adding personnummer ctd.
• Use Deathbook yrs 1975-2009. Use date of birth-part of personal numbers
• Re-run step 2-6 on previous slide
![Page 16: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/16.jpg)
Adding personnummer ctd.
Problem: not all inventors were previously identified no 4 last digitsTwo options to get full personal numbers from birth dates:1. Use InfoTorg again with name + added
parameter ”birthdate”2. Manually add four last digits by using
internet service (www.upplysning.se)
![Page 17: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/17.jpg)
Some matching problems
• Difficult to match individuals who change last names (mainly women) or with common names and who move a lot.
• Two people with the same name can live on the same address (i.e. father names his son after himself) – possibility to match the wrong person. If detected, oldest person is chosen.
• For inventors affiliated with some firms (AstraZeneca), company address given
![Page 18: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/18.jpg)
Towards 100%• Idea: scoring methods based on identified inventors
– Name– Identified co-inventors– Technology class– City– Postal code– Which algorithm?
• Statistics Sweden for validating parent/child name similarity problem?
• Use 1980 population CD?• Strategy of focusing on highly productive unmatched inventors?
![Page 19: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/19.jpg)
Suggestions/questions
![Page 20: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/20.jpg)
Patent distribution by sector
![Page 21: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/21.jpg)
Patent distribution in manufacturing (share of total patenting)
![Page 22: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/22.jpg)
Patent distribution in services (share of total patenting).
![Page 23: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/23.jpg)
Education level among inventors
![Page 24: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/24.jpg)
Percentile distribution of inventors’ patent productivity.
Percentile All patents Contribution Patents 2004-07 Contribution 2004-07
Percentile value Percentile value Percentile value Percentile value
1% 1 0.12 1 0.11
5% 1 0.20 1 0.17
10% 1 0.25 1 0.20
25% 1 0.33 1 0.33
50% 1 0.83 1 0.50
75% 3 1.50 2 1.00
90% 6 3.00 4 2.00
95% 9 5.00 6 3.00
99% 21 11.50 12 5.83
Mean/inventor 2.81 1.40 2.06 0.97
Number of inventors
18 489 18 489 8 526 8 526
![Page 25: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/25.jpg)
Sectors, SNI92-codes, # inventors, contribution 2004-2005.
Sector SNI92-codes Unique inventors, mean/year 2004-2005
Contribution*, mean 2004-2005
% cooperation cross sector
1994-1995
% cooperation cross sector
2004-2005
Primary 1000-14999 8.5 5.9 28% 28%
Manufacturing 15000-37999 1567 749.9 11% 11%
Services 38000-74999, 80410, 80423-80425, 80427-80429, 85200, 85325, 91111-91330, 92110-92130, 92310, 92330-92400, 92611-92614, 92621-99000
806.5 411.1 23% 23%
Academia 80301-80309 and ** 190 72.6 54% 54%
Public sector 75000-80299, 80421-80422, 80426, 85000-85140, 85311-85324, 90000-90008, 92200, 92320, 92511/92530, 92615
62.5 28.4 67% 67%
* ”Contribution” counts patent fractions which adjusts for co-inventorship.** ”Academia” can also in a few cases be found in the sectors R&D in technical and natural sciences (73101-73104) and in technical testing and analysis (74300).
![Page 26: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/26.jpg)
Cooperation by sector, 2004-05Primary Manufacturin
gServices Academia Public
sectorSum
Primary43% 57% 0% 0% 100%
Manufacturing
1% 77% 17% 5% 100%Services
1% 66% 24% 9% 100%Academia
0% 29% 48% 22% 100%Public sector
0% 18% 37% 45% 100%
![Page 27: Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se](https://reader035.vdocument.in/reader035/viewer/2022062520/568163ee550346895dd56450/html5/thumbnails/27.jpg)
The most important patenting academic institutions 2004-2005
Univ/institute
Contributions/year
Share Patents/billion research revenue SEK
Patents/thousand FTE, NTM
Lund 20.3 23% 6.3 15.0
Uppsala 11.6 13% 4.2 9.7
Karolinska 11.6 13% 3.9 9.3
KTH 9.8 11% 5.7 8.7
Göteborg 9.0 10% 3.7 10.9
Linköping 7.9 9% 6.4 10.3
Chalmers 7.2 8% 5.1 8.6
Stockholm 2.9 3% 1.7 4.1
Umeå 2.3 3% 1.5 2.8
Sum 82.6 94% 4.4 9.3
Others (13) 5.0 6% 1.3 1.8