EUROPEAN COMMISSION
EUROSTAT
Directorate E: Sectoral and regional statistics
Unit E-4: Regional statistics and geographical information
Doc. WG/LCU 47
LUCAS 2015 Survey
Eurostat – Unit E4
ITEM 3
Working Group
for Land Cover and Land Use Statistics
Meeting of 17 and 18 November 2016
Luxembourg BECH Building – Room Quetelet
starting on 17 November at 09h30 and planned to finish on 18 November at 13h00
CHAIRED BY: MR GUNTER SCHÄFER
No interpretation, English only
2
1 INTRODUCTION ..................................................................................................................................... 2
2 LUCAS 2015 SURVEY .............................................................................................................................. 3
2.1 Quality assurance for the data collection .................................................................................. 3
2.2 Updated data collection tool and new central server infrastructure ........................................ 3
2.3 Field work related quality control ............................................................................................. 4
2.4 Eurostat quality control (review, validate and edit) .................................................................. 4
2.5 Analysis and dissemination of LUCAS Survey data .................................................................... 5
3 QUALITY INDICATORS FOR THE LUCAS SURVEY .................................................................................... 5
3.1 Survey performance 2015 ......................................................................................................... 5
3.2 LUCAS Survey locational accuracy ............................................................................................. 8
3.3 Points rejected by external quality control ............................................................................. 11
4 ANNEX 1............................................................................................................................................... 14
1 INTRODUCTION
Eurostat's mission is to be the leading provider of high quality statistics on Europe. Its task is to provide the European Union with statistics at European level that enable comparisons between countries and regions over time.
Land cover and land use statistical data (derived from the micro data collected in the LUCAS survey) is used by Eurostat to calculate Sustainable Development Indicators and Land take. Landscape indicators are also derived from the data collected in the survey.
LUCAS micro data is also used for Agro Environmental Indicators (AEI), LULUCF (land use, land use change and forestry) indicators, to Europe Resource Efficiency indicators and are planned to be used in assessing the Good Agricultural and Environmental Condition (GAEC). Moreover in the context of CORINE Land Cover (CLC) and all other pan-EU mapping initiatives, such as the Copernicus HRL (High Resolution Layers) LUCAS micro data and photos
1 are used for production, verification and validation processes.
LUCAS surveys have been carried out since 2000. Survey purpose, scope and methodology evolved during the years; since 2006 it takes place every 3 years. So far the LUCAS Survey has been managed centrally by Eurostat through service contracts with private companies and/or public authorities. From the data collected in the survey, Eurostat produces fully harmonised and comparable statistics, indicators and in-situ data on land cover, land use and landscape at NUT2 level. A landscape photo archive is also part of the information produced and disseminated.
Commission DGs are the main final clients of the data collected in the LUCAS survey (also known as micro data). The LUCAS survey has to be financed by the Commission DGs and the survey cannot take place if there is no commitment on their side. The user needs for the 2015 survey were confirmed.
In 2010 upon request of DIMESA and CPSA, Eurostat set up an Advisory Group in order to discuss and to elaborate recommendations for a long-term LUCAS strategy. The group was composed by 10 Member States (BG, DE, FR, ES, HU, PL, FI, IT, RO, UK) whose experts have been nominated by CPSA and DIMESA committee, CH and by representatives of Commission Services (ENV, CLIMA, ENTR, AGRI, JRC), EEA and FAO.
1 Point and landscape photos in the four cardinal directions collected during the LUCAS in-situ survey
3
Primary data for the campaigns of 2006 through 2015 are available on line on the land cover/use statistics dedicated section.
2 LUCAS 2015 SURVEY
The LUCAS 2015 survey covered 273.401 points in all 28 EU Member States. Further 66.742 points were later photo interpreted in the office. Out of these a number of points are not disseminated as they are found to fall on open sea, or out of the NUTS and transitional water limits.
The survey ran smoothly in most countries. The companies responsible for the 5 field work lots and the Croatian grant beneficiaries have submitted all points to the external quality control. The external quality control has again proven to be very important for the overall data quality.
2.1 Quality assurance for the data collection
Quality assurance is a central component throughout all the phases of the LUCAS survey as to assure the quality and the comparability of results.
Quality assurance covers different aspects, first of all the provision of a common framework for all participants. This is especially important as the survey has been split up in several lots, which have been contracted to different entities and a common understanding across the lots needs to be assured. To this end the following actions have been foreseen:
Common documentation and instructions for all surveyors
Common “Frequently Asked Questions and Answers” document updated regularly based on issues raised by the contractors during the running of the survey
Standardised and automated Data Management Tool (DMT)
Common training for all the Survey Managers
The training for the survey managers includes in-door sessions - covering the overall approach, the survey instructions and the data management tool - as well as a field trip to allow for hands-on experience.
A second part of the quality assurance is related to the field work and includes:
Follow-up visit to each country by a team of experts
Internal quality check
Independent data quality control
Eurostat quality control
The field work related support by an expert team aims to identify and correct systematic errors in data collection and survey management as early as possible. Information collected concerned the set-up of the survey, the number of surveyors and their training, communication and quality control. Based on the results a second round of follow-up visits were organised to propose corrective measures where needed.
2.2 Updated data collection tool and new central server infrastructure
In 2015 a significant change was introduced in the Data Management Tool, by creating a central server infrastructure which allowed for the immediate access to the data by the upper levels of control
2. The heavy
client (in MS Access) was kept, and was still used for data collection, validation of internal consistency and was the support for the visual quality control acceptance and rejection of points.
2 Note however that for checks to be performed by the external quality contractors, the points have to be
released by Central Offices of the field work contractors first.
4
2.3 Field work related quality control
The internal quality check took place at the field work contractor’s regional or central offices and concerned all the data collected for all the LUCAS points in the 28 participating countries.
An independent external data visual quality control of over 1/3 of the points was assured by a separate expert team of data controllers. All available information (ancillary information, ground documents, metadata on the survey, land cover and land use classification, transect data, GPS tracks, photos, justification for photo-interpretation) is analysed to evaluate the reliability of the results. Point data that clearly requires correction or clarification is rejected and send back to the field work contractors.
After a second control the data was forwarded to Eurostat, where a further quality control took place. Namely points twice rejected are checked to guarantee the compliance with the tender specifications (for each country no more than 1% of the points of the survey can be rejected twice).
In this campaign an additional control for checking photo anonymization was added.
The LUCAS viewer also allowed for an integrated display of all the elements needed for the quality control. However the entry of quality check comments and the approval/refusal of the points had to be done in the heavy DMT client, requiring that the data and files were downloaded and re-uploaded.
Figure 1 – LUCAS 2015 DMT Map. A point was entered with wrong coordinates and has to be corrected.
2.4 Eurostat quality control (review, validate and edit)
In Eurostat the quality control first includes the consolidation of the “raw” data set. Further steps of the validation process
3 include for example the consistency checks with other datasets of the same domain
(previous years LUCAS data) and consistency with data of other data providers. Currently the raw data
3 Bosch et al. (2015) Methodology for data validation. ESSNET VALIDAT Foundation in
https://ec.europa.eu/eurostat/cros/system/files/methodology_for_data_validation_v1.0_rev-2016-06_final.pdf (2016.10.17)
5
published in the LUCAS dedicated section4 is validated at level 0 (internal consistency) and level 1 (business
rules). Eurostat is now performing further quality checks (level 1 to level 5, including cross checks, confrontation with data from different sources, etc.).The process is still ongoing. More details of the state of the art of the validation process can be found in Annex 1.
Currently Eurostat is also performing a number of macro and micro editing techniques in order to fine tune the final estimates. The identification of possible influent errors might be fed into the validation process and imply further corrections to the micro data.
2.5 Analysis and dissemination of LUCAS Survey data
A first release of formally corrected records was published on LUCAS dedicated section in July 2016, together with the LUCAS 2x2 Grid.
The 2015 photo archive is also available and orders can be made through the order form in the LUCAS dedicated section
5.
3 QUALITY INDICATORS FOR THE LUCAS SURVEY
For LUCAS 2015 a quality report according to Eurostat standards is in progress and will be published by the end of the year. This report will include a description of the methodology step-by-step (sampling design, ground survey, additional photo interpretation, quality control and post processing). Also included is the relevant information on accuracy and reliability of the results, both for the micro data and for the statistical aggregates for land cover and land use, the coherence and comparability.
Some indicators for the LUCAS Survey are given below.
3.1 Survey performance 2015
In 2015, 729 surveyors were recruited for a total of 273.401 points to be visited in the ground. Further 64 photo interpreters were responsible for the photointerpretation of 66.742 points. The average number of points per surveyor in 2015 was 372 (see Table 1), compared to 366 in 2012 and 405 in 2009. There were however important differences between the countries and the maximum average number of points per surveyor was 732 (see Table 1).
4 http://ec.europa.eu/eurostat/web/lucas/data/primary-data/2015
5 http://ec.europa.eu/eurostat/web/lucas/data/primary-data/order-form
6
Table 1: LUCAS 2015 - Organisation of the work (field survey)
COUNTRY# OF
POINTS
# OF
SURVEYORS
AVG POINTS
PER
SURVEYOR
START
DATEEND DATE
WORKING
DAYSMAN DAYS
AVG POINTS PER
DAY
AT 6 679 21 318 30/Apr/15 16/Dec/15 141 822 8.1
BE 2 412 5 482 23/May/15 10/Nov/15 110 260 9.3
BG 6 621 35 189 26/Apr/15 02/May/16 225 1 028 6.4
CY 1 442 5 288 30/Mar/15 01/Oct/15 103 190 7.6
CZ 5 492 9 610 21/Apr/15 10/Sep/15 127 583 9.4
DE 24 887 56 444 01/May/15 29/Jan/16 178 2 139 11.6
DK 3 443 5 689 05/May/15 26/Sep/15 119 289 11.9
EE 2 250 5 450 02/May/15 02/Dec/15 131 194 11.6
EL 7 810 37 211 11/Apr/15 08/Nov/15 198 983 7.9
ES 35 227 61 578 13/Mar/15 22/Sep/15 188 2 731 12.9
FI 13 379 34 394 29/May/15 22/Oct/15 147 1 451 9.2
FR 38 413 112 343 12/May/15 10/Mar/16 203 3 843 10.0
HR 3 531 13 272 18/Mar/15 18/Mar/16 188 563 6.3
HU 4 626 30 154 07/Jan/15 08/Oct/15 86 799 5.8
IE 3 461 9 385 18/May/15 17/Dec/15 150 513 6.7
IT 20 919 80 262 26/Feb/15 29/Jan/16 208 2 574 8.1
LT 3 873 10 387 01/May/15 25/Nov/15 134 281 13.8
LU 206 46 5 21/May/15 12/Nov/15 16 57 3.6
LV 4 497 13 346 07/Jan/15 12/Jan/16 152 397 11.3
MT 78 1 78 10/May/15 20/May/15 10 10 7.8
NL 2 211 4 553 04/May/15 16/Oct/15 114 203 10.9
PL 21 719 34 639 05/Jan/15 25/May/16 210 1 565 13.9
PT 7 315 10 732 08/May/15 29/Sep/15 118 520 14.1
RO 14 230 23 619 05/May/15 28/Oct/15 117 1 076 13.2
SE 22 317 39 572 08/May/15 11/Feb/16 251 2 154 10.4
SI 1 614 8 202 13/May/15 06/Oct/15 116 224 7.2
SK 2 438 6 406 04/May/15 23/Sep/15 96 341 7.1
UK 12 063 18 670 20/Apr/15 23/Feb/16 180 1 570 7.7
EU 273 153 729 375 05/Jan/15 25/May/16 27 360 10.0
For the photointerpretation the number of points per day is considerably larger, as expected (see Error! Reference source not found.).
Figure 2 - Figure 3: Average number of points surveyed per surveyor and per day, by country
7
Table 2 - LUCAS 2015 - Organisation of the work (PI extension)
COUNTRY# OF
POINTS
# OF PHOTO
INTERPRETERS
START
DATEEND DATE
MAN
DAYS
AVG POINTS
PER DAY
AT 2 156 3 24/Mar/16 06/May/16 52 42
BE 487 2 28/Jan/16 01/Feb/16 5 97
BG 1 051 4 09/May/16 14/May/16 24 44
CY 284 1 15/Mar/16 16/May/16 19 15
CZ 221 1 02/Mar/16 09/May/16 19 12
DE 1 861 2 18/Feb/16 30/Mar/16 36 52
DK 219 1 07/Mar/16 09/Mar/16 3 73
EE 386 2 02/Mar/16 15/Apr/16 18 21
EL 4 734 5 01/Mar/16 09/May/16 98 48
ES 15 049 4 01/Jun/15 08/May/16 232 65
FI 2 729 2 31/Mar/16 26/Apr/16 32 85
FR 9 792 2 01/Feb/16 14/Mar/16 50 196
HR 0 0 0
HU 543 4 04/May/16 09/May/16 13 42
IE 1 474 1 09/Feb/16 07/Mar/16 18 82
IT 7 837 5 03/Feb/16 06/May/16 116 68
LT 710 1 23/Mar/16 15/Apr/16 16 44
LU 46 1 10/Mar/16 10/Mar/16 1 46
LV 878 1 17/Mar/16 06/May/16 32 27
MT 0 0 0
NL 381 2 16/Mar/16 17/Mar/16 3 127
PL 1 364 2 22/Feb/16 26/Mar/16 22 62
PT 1 742 1 05/Feb/16 04/Apr/16 31 56
RO 2 493 11 12/Feb/16 18/Feb/16 36 69
SE 4 365 2 16/Feb/16 31/May/16 61 72
SI 310 1 11/Mar/16 30/Mar/16 12 26
SK 317 1 01/Apr/16 25/Apr/16 17 19
UK 4 980 2 03/Mar/16 19/Apr/16 55 91
EU 66 409 64 09/May/16 31/May/16 1 021 65
The time spent per point on the field survey varies between 18 and 44 min, the average being 24 min, the same as in 2012 (see Error! Reference source not found.).
The average time needed to visit each point depends on the land cover and the land use of the point and surroundings and is obviously also strongly related to the closeness of the point to the next road. Surveyors first of all have to reach the point and then they had to walk along a transect of 250 m towards the East. In general points in the forest and wetlands were the most difficult to reach (see Error! Reference source not found. and Error! Reference source not found.).
Figure 4 - Average time spend per point by country (in min)
8
Table 3 - LUCAS 2015 Average time spent during the point survey and photo interpretation (regular field survey, including observation type 7 "ex-ante") by land cover class
LAND COVER CLASS DISTANCE < 100m DISTANCE > 100m PI EX-ANTE PI
Artificial (5.2%) 20:44 17:13 17:13 04:01
Crop (30%) 22:44 17:57 15:42 07:00
Wood (33.6%) 30:46 20:08 18:41 03:15
Shrub (5.2%) 27:11 20:17 17:10 07:50
Grass (21.2%) 23:54 18:37 16:24 09:06
Bare (1.9%) 21:48 17:29 15:40 09:07
Water (2%) 25:11 18:36 14:29 02:13
Wetland (0.9%) 30:34 22:33 16:15 02:03
3.2 LUCAS Survey locational accuracy
A first quality indicator for the LUCAS Survey concerns the distance from the actual point of observation to the theoretical point of observation.
This indicator covers the following categories:
Field survey: point visible, distance 0 – 100m
Field survey: point visible, distance > 100m
Figure 5 - European average time spent per point by main land cover class (regular field survey, including observation type 7 "ex-ante", time spent more than 2 hours excluded, grid of 7 minute intervals)
9
Photo interpretation in the field (when point is not reached or not visible)
Ex-ante photo interpretation
Table 4 – Number of surveyed points by type of observation (points of the field survey without GPS coordinates are excluded from the assessment)
Extended PI
COUNTRY ∑ < 100 m > 100 m Field PIEx Ante
PI∑
AT 6 679 5 426 89 1 163 2 156
BE 2 412 2 237 39 135 487
BG 6 621 5 290 335 964 29 1 051
CY 1 442 1 212 61 112 56 284
CZ 5 492 5 328 88 76 221
DE 24 887 22 707 1 409 770 1 861
DK 3 443 3 197 174 72 219
EE 2 250 1 799 91 341 19 386
EL 7 810 6 149 609 842 210 4 734
ES 35 227 30 353 1 367 3 507 15 049
FI 13 379 11 266 824 1 289 2 729
FR 38 413 35 487 762 2 128 27 9 792
HR 3 531 2 770 32 349 378 0
HU 4 626 4 376 108 142 543
IE 3 461 2 572 361 522 5 1 474
IT 20 919 15 808 1 634 3 446 29 7 837
LT 3 873 3 624 52 168 29 710
LU 206 203 3 46
LV 4 497 3 800 91 584 21 878
MT 78 69 1 8 0
NL 2 211 1 951 156 104 381
PL 21 719 19 655 841 1 223 1 364
PT 7 315 6 437 186 692 1 742
RO 14 230 10 220 751 3 259 2 493
SE 22 317 15 975 772 1 213 4 356 4 365
SI 1 614 1 503 14 97 310
SK 2 438 2 098 91 249 317
UK 12 063 9 210 1 031 1 752 65 4 980
EU 273 153 230 722 11 969 25 210 5 224 66 409
FIELD SURVEY
Figure 6 – Average distance of observation by country (field survey, all land cover classes)
10
Figure 7 – Percent of points per observation type per country (field survey); 1 for <100 m 2 for > 100m and 3 for PI in the field
For the majority of the countries, more than 80% of the points could be reached and collected at a distance lower than 100m without the need to photo interpret in the field.
Note that the accuracy varies considerably for different land cover classes. The lowest accuracy, with a mean distance from the point reached to the theoretical point of observation was 675 m for the land cover class "Wetland" while the highest accuracy was reached for "Artificial", with an average distance of 33 m.
11
3.3 Points rejected by external quality control
An indirect indication of the quality of the results is given by the rejection rate during the external quality control done by the external quality controllers. This control was done on 40% of the points surveyed and 16% of the ex-ante PI points. The total number and the rate of points checked by country are presented in Table 5 and Table 6.
The first rejection rate varies between 0,08 (BG) and over 70% (HR) with an average of 25% for the regular survey. The values are much lower for the PI extension, with several countries with 0 or close to 0 first rejection rates and an overall 8% first rejection rate.
Figure 8 – Average distance of observation by land cover class
12
Table 5 - Points in survey and PI extension, checked and refused at least once per country and observation type
∑ <100m >100m PI ∑ <100m >100m PI
AT 3107 2090 37 980 340 544 373 4 167 28
BE 913 776 10 127 76 240 215 25 8
BG 1198 884 67 247 166 99 77 22 5
CY 744 596 39 109 44 219 187 5 27 5
CZ 1851 1745 30 76 34 454 431 23 2
DE 9534 8266 513 755 293 2229 1932 116 181 10
DK 1179 1057 56 66 36 259 235 9 15 2
EE 820 587 25 208 61 120 91 2 27 2
EL 4004 2932 232 840 754 825 606 49 170 119
ES 20160 16037 617 3506 2378 8211 6315 223 1673 187
FI 5021 3530 247 1244 426 647 408 18 221 5
FR 14107 11875 197 2035 1545 3465 2987 23 455 132
HR 1388 1153 10 225 977 827 8 142
HU 1389 1250 27 112 77 261 240 1 20
IE 2024 1389 158 477 233 412 291 28 93 11
IT 6325 4172 327 1826 1239 1465 1113 88 264 72
LT 1245 1081 12 152 112 115 107 8
LU 113 110 3 7 54 52 2 1
LV 1776 1188 19 569 138 285 169 7 109 5
MT 41 33 8 18 13 5
NL 875 716 55 104 60 192 159 11 22 2
PL 7780 6380 284 1116 214 1255 1090 25 140 2
PT 4640 3867 81 692 275 1746 1464 22 260 27
RO 6849 3711 316 2822 394 776 595 24 157 10
SE 3713 2642 123 948 690 413 353 17 43 77
SI 639 538 6 95 48 124 112 1 11 10
SK 1038 761 28 249 50 173 115 2 56
UK 7268 5122 475 1671 791 1680 1212 95 373 66
EU 109741 84488 3991 21262 10481 27258 21769 778 4711 788
Obs Type Ext PICOUNTRY
Obs Type
Points checked by external contractor by country
Ext PI
Points refused at least once by country
Table 6 - Number of total points (excluding PI in the field), points checked and first rejection rates by country
NUTS Surveyed points Checked Refused once % of checked % of refused at least once
HR 3531 1163 835 32.9 71.8
LU 206 110 52 53.4 47.3
MT 78 33 13 42.3 39.4
ES 35227 16654 6538 47.3 39.3
PT 7315 3948 1486 54 37.6
CY 1442 635 192 44 30.2
BE 2412 786 215 32.6 27.4
IT 20919 4499 1201 21.5 26.7
FR 38413 12072 3010 31.4 24.9
CZ 5492 1775 431 32.3 24.3
UK 12063 5597 1307 46.4 23.4
DE 24887 8779 2048 35.3 23.3
NL 2211 771 170 34.9 22
DK 3443 1113 244 32.3 21.9
SI 1614 544 113 33.7 20.8
EL 7810 3164 655 40.5 20.7
IE 3461 1547 319 44.7 20.6
HU 4626 1277 241 27.6 18.9
AT 6679 2127 377 31.8 17.7
PL 21719 6664 1115 30.7 16.7
RO 14230 4027 619 28.3 15.4
EE 2250 612 93 27.2 15.2
SK 2438 789 117 32.4 14.8
LV 4497 1207 176 26.8 14.6
SE 22317 2765 370 12.4 13.4
FI 13379 3777 426 28.2 11.3
LT 3873 1093 107 28.2 9.8
BG 6621 951 77 14.4 8.1
EU 273153 88479 22547 32.4 25.5
13
The first rejection rate is clearly higher than in the previous campaign where it reached 13%. One note has to be taken as the specific management of soil points in triplets has somewhat distorted the first rejection values in comparison with the previous campaign, as points which are in a triplet are all rejected if at least one had to be rejected. However this accounts only for a small part of the increase.
In 2015 the main cause for rejection was clearly codification of the transect and missing additional information (see Figure 9). Missing geo-tags in photos concerns local issues for ES, IE, IT and UK.
Figure 9 – Distribution of comments related to rejections by the external quality controllers.
Although rejected points are sent back to the contractors for correction and the second rejection rate is below 1% for all countries, these results emphasise the need to perform further quality checks at Eurostat level for points which have not gone through the external quality controls.
14
4 ANNEX 1 – ONGOING TASKS FOR THE VALIDATION OF LUCAS' MICRO DATA
Data Validation is an activity verifying whether or not a combination of values is a member of a set of
acceptable combinations. (in "Methodology for data validation6")
According to the authors, the set of 'acceptable values' may be a set of possible values for a single field. But under this definition it may also be a set of valid value combinations for a record, column, or larger collection of data. We emphasize that the set of acceptable values does not need to be defined extensively. This broad definition of data is introduced to make data validation refer both to micro and macro (aggregated) data. Data validation assesses the plausibility of data: a positive outcome will not guarantee that the data is correct, but a negative outcome will guarantee that the data is incorrect.
The relation with statistical data editing must be clarified. In the Generic Statistical Business Process Model (GSBPM) the process ‘Validate and Review’ is distinguished from the process ‘Edit and Impute’. In the ‘Validate and review phase’ there is data validation as it is previously described, while in the ‘edit and impute phase’ it is placed the action of ‘changing data’. This is the idea underlying the validation definition. (in "Methodology for data validation")
Figure 10 – Validation steps according to the ESSNET Validation Manual
Level 0: consistency of the data with their expected IT requirements
For these quality checks only the structure of the file or the format of the variables are necessary as input and no data checks are performed.
Y/N OBS Next action
the file has been sent/prepared by the authorised authority (data sender);
Y Finished
the column separator / end of record symbol are correctly used
Y Finished
the file has the expected number of columns (agreed format of the file)
Y Finished
6 https://ec.europa.eu/eurostat/cros/content/methodology-data-validation-10-handbook-revised-edition-june-
2016_en
15
the column have the expected format of the data (i.e., alphanumeric, numeric, etc.)
Y Finished
the file complies to the naming convention (original)
Y Finished
the file complies to the naming convention (derived datasets)
pending No naming convention was defined for derived datasets
Define naming convention for derived datasets
Level 1: consistency within the data set
Only the (statistical) information included in the file itself is needed.
During the LUCAS Survey data collection, the DMT Tool already includes 218 embeded checks. This increases the quality of the data by avoiding systematic errors.
Y/N OBS Next action
the records conform to the latest business checks included in the DMT Client Business rules (v.1.1.9)
Pending Presently not all points have been checked against the latest version of the business rules (U:\LUCAS\010 CONTRACTS\2014\08441.2014.002-2014.408-LUCAS2015-LOT7-A1\003-FollowUp\004-Deliverables\Contrls\LUCAS_DMT_PARAMETER_20160301_119.mdb). It is known to which version a point was checked against.
A full check has to be run on all records.
the records conform to the checks indicated on the issue log
Pending Presently only a minor part of the 149 [update 2016.10.17] bulk checks needed which were identified in the log was performed.
U:\LUCAS\010 CONTRACTS\2014\08441.2014.002-2014.000-LUCAS2015\010-QualityChecks\24.Issue_Log.LUCAS2015_ESTATQC.20150618.xlsx
Prioritize, classify (see Annex A of the methodology handbook), and run the checks. Identify further needs for additional checks.
Level 2: consistency with other data sets within the same domain and within the same data source
Validation levels 2 is concerned with the check of consistency based on the comparison of the content of the file with the content of "other files" referring to the same statistical system (or domain) and the same data source.
Level 2A
In validation level 2A the other files refer to other versions of exactly the same file. In this case the quality checks are meant to detect "revisions" compared to previously sent data. Detection and analysis of revisions can be useful for example to verify if revisions are consistent with outliers detected in previous quality checks (corrections) or to have an estimate of the impact of the revisions in the "to be published" results, for the benefit of the users.
Y/N OBS Next action
the records in latest version include all the corrections of the previous versions
Not applicable
Presently we are dealing with the raw micro data, not yet edited. May become relevant at a later stage
Not applicable
Level 2B
In validation level 2B, "other files" can be versions of the same data set referring to other time periods. These checks are usually referred to as "time series checks" and are meant to verify the plausibility of the time series.
16
Y/N OBS Next action
the records are plausible against data from LUCAS SU 2006
pending Panel points where issues were identified by the surveyor are coded (BP codes), but no action was performed
Quantify number of points and prioritize
the records are plausible against data from LUCAS SU 2009
pending Panel points where issues were identified by the surveyor are coded (BP codes), but no action was performed
Quantify number of points and prioritize
the records are plausible against data from LUCAS SU 2012
pending Panel points where issues were identified by the surveyor are coded (BP codes), but no action was performed
Quantify number of points and prioritize
Level 2C
In validation level 2C the "other files" can refer to other data sets from the same data provider, referring to the same or other correlated time periods. Sometimes a group of data sets (same country, same reference period) is sent at the same time. For example: an enterprise included in the admin data must be part of the predetermined population (from the Business Register), three files could be sent at the same time, from the same country and referring to the same time period: one file includes data for "females", one for "male", one for "total". Consistency between the results of the three files can be checked. Another example is for results from annual data sets can be compared with the results of the corresponding quarterly data sets.
Y/N OBS Next action
the records are plausible against data in the Master 2015 (strata)
pending compare observed data against strata in the Master 2015
Quantify number of points and prioritize
all points in the LUCAS 2015 survey table are part of the Master 2015
Y Finished
Level 3: consistency within the same domain between different data sources
Validation levels 3 is concerned with the check of consistency based on the comparison of the content of the file with the content of "other files" referring to the same statistical system (or domain) but with a different data source.
For instance the "other files" can refer to the same data set, but from another data provider (e.g., other countries of the ESS). Mirror checks are included in this class. Mirror checks verify the consistency between declarations from different sources referring to the same phenomenon, e.g., export declared by country A to country B should be the same as import declared by country B from country A.
Y/N OBS Next action
the records are plausible against data in the same domain coming from different data sources
Not applicable
the described case does not apply to LUCAS Survey raw micro data
Not applicable
Level 4: consistency between separate domains in the same data provider
Validation level 4 could be defined as plausibility or consistency checks between separate domains available in the same institution. The availability implies a certain level of "control" over the methodologies by the concerned institution. These checks could be based on the plausibility of results describing the "same" phenomenon from different statistical domains. Examples: unemployment from registers and from Labour Force Survey, or inhabitation of a dwelling (from survey of owners of houses and dwellings vs. from population register)
Y/N OBS Next action
the records are plausible Not this validation is of interest for aggregated Not
17
against data in separate domains in Eurostat
applicable results (e.g. crop statistics, forest statistics, transport networks)
applicable
Level 5: consistency with data of other data providers
Validation level 5 could be defined as plausibility or consistency checks between the data available in the data provider (institution) and the data / information available outside the data provider (institution). This implies no "control" over the methodology on the basis of which the external data are collected, and sometimes a limited knowledge of it.
Y/N OBS Next action
the records are plausible against data of CLC
pending possible against existing data of CLC00, CLC06 and CLC12
Contingency table of LUCAS x CLC ongoing
the records are plausible against data of OSM
pending possible but according to previous trials still of low value as there are topological issues in the available OSM datasets
None for the moment
the records are plausible against data of the Urban Audit
pending possible but probably of low interest as this is normally an aggregation of CLC classes
None for the moment
the records are plausible against data from EEA Transitional waters
pending partly done, issues are identified Prepare for correction
the records are plausible against data from EEA Coastal waters
pending partly done, issues are identified Prepare for correction
the records are plausible against data from EBM NUTS
pending partly done, issues are identified Prepare for correction