proposal of a collaboration to improve the ethnicity classification of patient registers pablo...
TRANSCRIPT
Proposal of a collaboration to improve the ethnicity
classification of patient registers
Pablo MateosUCL - CASA25th May 2005
Contents
1. Aims of proposal
2. Mutual Benefits & Justification
3. Members
4. Data Sharing
5. Data Protection
6. Intellectual Property
7. Project Name
1- Aims
The purposes of the group are threefold:
• To facilitate access to the Names-to-CEL directory
developed by CASA
• To develop and to share access to knowledge relating
to:
Effective use of the Names-to-CEL directory in public health
Data mining of birthplace information in the ‘Exeter’ register
• To improve the quality and accuracy of the directory by
contributing anonymised data from operational files
2- Mutual Benefits
Benefits for the model
• Wider population base per
surname
– More ethnic groups better
represented
– Better Firstname or Surname
matches
• More extensive birthplace
name alias tables
Benefits for the PCTs
• Birthplace information
correctly classified
• Ethnic group classification
provided
• Richer ethnic classification:
Beyond 16+
At individual level
• Know-how already built
One PCT is not enough
Frequency Distribution of Camden PCTSurnames or Forenames >40 people
0
500
1000
1500
2000
2500
3000
1 51 101 151 201 251 301 351 401 451 501 551 601 651
Nr. of Names
Nr
of
Peo
ple
/ N
ame
Surnames Forenames
London ‘non-16+ ethnic groups’ (1.2 million people stated ‘other’ ethnic identities in London 2001 Census)
Ethnic Group PopulationOther white European, European Mixed 185,690Other white, white unspecified 171,744English 154,203Sri Lankan 53,307Black British 46,348Turkish 37,827Italian 35,252Other Mixed, Mixed unspecified 35,027Any other group 29,469Greek Cypriot 23,340Middle Eastern (excluding Israeli, Iranian and 'Arab') 20,537Arab 20,256Filipino 19,669Japanese 19,415Other mixed white 19,239Other Asian, Asian unspecified 18,334Greek 17,888Iranian 16,494Multi-ethnic islands 15,952Polish 15,928South and Central American 15,607British Asian 14,625Turkish Cypriot 14,074
Ethnic Group PopulationVietnamese 11,719Commonwealth of (Russian) Independent States 11,606North African 11,218Kurdish 9,659Latin American 9,188Mixed Black 9,001Jewish 8,912Other Black, Black unspecified 8,344Cypriot (part not stated) 7,360Mixed: Irish and other white 7,071Scottish 7,020Kosovan 6,896Welsh 6,895Somali 6,172East African Asian 5,328Chinese and White 4,871Tamil 4,758Black and White 4,226Moroccan 4,133Caribbean Asian 4,070Black and Asian 3,946Malaysian 3,384Albanian 3,226Sikh 2,814
Source: 2001 Census GLA commissioned tables(.../...)
1 ANGLOPHONE2 ANGLOPHONE: CARIBBEAN3 BLACK AFRICAN: CONGOLESE4 BLACK AFRICAN: ETHIOPIAN5 BLACK AFRICAN: GAMBIAN6 BLACK AFRICAN: GHANAIAN7 BLACK AFRICAN: KENYAN8 BLACK AFRICAN: LIBERIAN9 BLACK AFRICAN: NIGERIAN
10 BLACK AFRICAN: SIERRA LEONEAN11 BLACK AFRICAN: SOUTH AFRICAN12 BLACK AFRICAN: UGANDAN13 BLACK AFRICAN: UNCLASSIFIED14 EAST ASIAN: CHINESE15 EAST ASIAN: INDOCHINA16 EAST ASIAN: JAPANESE17 EAST ASIAN: KOREAN18 EAST ASIAN: VIETNAMESE19 EUROPEAN: BALKAN20 EUROPEAN: BRITISH: UNCLASSIFIED21 EUROPEAN: DANISH22 EUROPEAN: DUTCH23 EUROPEAN: DUTCH_WORLD24 EUROPEAN: EASTERN EUROPE25 EUROPEAN: FINNISH26 EUROPEAN: FRENCH27 EUROPEAN: FRENCH_WORLD28 EUROPEAN: GERMAN29 EUROPEAN: GERMAN OR DUTCH30 EUROPEAN: GREEK / GREEK CYPRIOT31 EUROPEAN: HUNGARIAN32 EUROPEAN: IRISH: UNCLASSIFIED33 EUROPEAN: ITALIAN34 EUROPEAN: NORDIC35 EUROPEAN: OTHER36 EUROPEAN: POLISH37 EUROPEAN: ROMANIAN38 EUROPEAN: SLAVIC39 EUROPEAN: SWEDISH
CEL Group
40 HISPANIC: BRAZILIAN41 HISPANIC: CATALAN42 HISPANIC: LATIN AMERICAN43 HISPANIC: PORTUGUESE44 HISPANIC: PORTUGUESE_WORLD45 HISPANIC: SPANISH46 HISPANIC: SPANISH_WORLD47 HISPANIC: UNCLASSIFIED48 JEWISH49 MUSLIM: AFGHAN50 MUSLIM: ARAB51 MUSLIM: ARMENIAN52 MUSLIM: BALKANS53 MUSLIM: BANGLADESHI54 MUSLIM: BLACK AFRICAN OTHER55 MUSLIM: EGYPTIAN56 MUSLIM: ERITREAN57 MUSLIM: EURASIA58 MUSLIM: IRANIAN59 MUSLIM: IRAQI60 MUSLIM: LEBANESE61 MUSLIM: MIDDLE EASTERN62 MUSLIM: NORTH AFRICAN63 MUSLIM: PAKISTANI64 MUSLIM: PERSONAL NAME65 MUSLIM: SOMALI66 MUSLIM: SOUTHEAST ASIA67 MUSLIM: SUDANESE68 MUSLIM: TURKISH69 MUSLIM: UNCLASSIFIED70 OTHER NON-BRITISH71 OTHER SOUTH ASIAN: HINDI72 OTHER SOUTH ASIAN: HINDI OR SIKH73 OTHER SOUTH ASIAN: NEPALESE74 OTHER SOUTH ASIAN: NORTH INDIAN75 OTHER SOUTH ASIAN: SIKH76 OTHER SOUTH ASIAN: SOUTH INDIAN & SRI LANKAN77 OTHER SOUTH ASIAN: UNCLASSIFIED
CEL Group
3- Members
• Primarily aimed at PCTs and health institutions
working on improving ethnicity classification
• Open to any institution interested in benefiting
and contributing to the ethnicity classification
model
• Pre-existing ‘operational names data’ at
individual level must exist within each member
• UCL will distribute to members each update of the Name-to-CEL directories:
- Surname-to-CEL
- Forename-to-CEL
• Members will provide 2 separate files:
- Surname-Birthplace aggregation
- Forename-Birthplace aggregation
• There will be no way to link these two files together
• Only 1 common version of the Name-to-CEL directory will be maintained
4- Data Sharing
Sur-names
Input & Processing Module (Highly restricted access)
BirthPlace Geocoder
SURNAME TOTAL COB1 COB2 COB3 COB4 ETCSURNAME X ZZ 37% 23% 12% 8% ETC
Records Aggregated by surname
Check > threshold
N Leave surname until more records arrive
Y
Proposed Data Flow (1)
Example provided here for Surnames. An exact parallel process applies for First Names
Output Surname
PCTs
Surname-to-CEL
UCL
Output Module
Current threshold =
Over 10 persons / surname
Output Module
CEL=COBN
Y
Surname-to-CEL Assigned
CEL=Group of COBs
N Manual Review
Surname-to-CEL Directory
Visual Inspection
Updates Distributed
Proposed Data Flow (2)
Input & Processing Module
Surname-to-CEL
• There will be no way to link the 2 files together (surname or forename)
• Records in the files will identify aggregations of either a surname or a forename, not individuals
• A minimum threshold of 10 persons per name will be applied to process & release the name to the output module.
• A detailed data sharing framework document is being developed to be signed by members
5- Data Protection
• Intellectual Property of the Names-to-CEL directory is held by University College London
• Access to this directory, and to the methods and tools developed in the project will be granted free of charge for contributing members
• A fee will be charged to non-contributing members, as per future arrangements
• Contributing members are those who provide data to improve the Name-to-CEL allocation
6- Intellectual Property
GEONom
7- Project Name
Geographic & Ethnic Origin of Names
www.casa.ucl.ac.uk/geonom
8- Open Discussion
8.1. Data Sharing and Data Protection
8.2. Methodology
8.3. Applications