editing process and its quality regarding design and ... · - the editing process can be...

15
Editing process and its quality regarding design and production phases using process metadata and calculation modules Pauli Ollila 15 September 2015 Work Session on Statistical Data Editing Topic (iv): Evaluation and feedback

Upload: others

Post on 24-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Editing process and its quality regarding design and production phases using process metadata and calculation modules

Pauli Ollila 15 September 2015 Work Session on Statistical Data Editing Topic (iv): Evaluation and feedback

Contents - Different phases of creating and using the editing strategy - GSBPM and the phases of creating and using the editing strategy - Generic E&I-process and process step definitions from GSDEM - Designing editing strategy before production - One process step with functions, modules and production actions - Metadata in editing process -  Planning, production and development tasks -  Problematic situations and unexpected tasks in production - Development tasks at different levels - Conclusions (in a bit straightforward manner)

15 September 2015 Pauli Ollila 2

DIFFERENT PHASES OF CREATING AND USING THE EDITING STRATEGY

3 15 September 2015 Pauli Ollila

1) DESIGNING THE EDITING STRATEGY The phase includes a variety of different planning and decision actions for editing, ranging from the construction of the process flow to selecting some principles to carry out edit rules. 2) CREATING THE REALIZATION SYSTEM OF THE STRATEGY The phase includes the IT choices for carrying out the editing strategy, both containing solutions for different parts in the editing strategy and possibly decisions for interactions between different IT environments and data structures. In principle this phase could also include decisions about non-IT practices belonging to the strategy, e.g. paper questionnaire studies. There might be various ways to carry out the methods, but the corresponding IT solutions for the methods are denoted here as modules. 3) TESTING THE STRATEGY The phase includes operations in the realization (IT) system of the editing strategy with specific test data sets (unedited, edited), in practice mostly from earlier rounds of the production of statistics. The testing procedure may have a systematic structure, but known statistic-dependent problematic areas can also be studied. 4) APPLYING THE STRATEGY IN PRODUCTION The phase contains the implementation of the editing strategy with chosen methods and parameterization in the corresponding IT system with data set(s) to be edited in the production process of statistics. Note that editing in production can also happen in a collection phase, i.e. when we are acquiring the data gradually.

The editing process “contains a number of activities or tasks that aim to assess the plausibility of the data, identify potential problems and perform certain selected actions that intend to remedy the identified problems”. The editing strategy covers in addition to the editing process also many other things connected to the operations dealing with the editing process in different contexts.

1)  Designing  the  edi-ng  strategy    

2)  Crea-ng  the  realiza-on  system  of  the  strategy  

3)  Tes-ng  the  strategy    

    4)  Applying  the  strategy  in  produc-on  

(4.3  for  edi,ng  during  data  acquisi,on)        

       

GSBPM  AND  THE  PHASES  OF  CREATING  AND  USING  THE  EDITING  STRATEGY  

4  

Generic E&I process from GSDEM

5 15 September 2015 Pauli Ollila

Process step definitions in GSDEM

-  The editing process can be structured by splitting it up into sub-processes, called process steps, and a process flow that describes the navigation among the process steps during execution.

-  An operational data editing process usually contains a considerable number of functions with specified methods that are executed in an organised way. The function types are Review, Selection and Amendment. A function is an instance of one of the three function types that serves a specific purpose in the chain of activities that leads to the edited data.

-  Data editing functions specify what action is to be performed in terms of its purpose, but not how it is performed. The latter is specified by the process method. For the purposes of production some computerized methods are parameterized in order to define the process exactly. Methods can be interactive as well.

6 15 September 2015 Pauli Ollila

The connection between four phases of creating and using editing strategy and GSDEM is emphasized via terms process flow, process steps, functions, methods and parameterization.

PROCESS  FLOW  (here  without  order)            

PROCESS  STEP  

PROCESS  STEP  (F)  with  func-ons  defined  

         

FUNCTION   FUNCTION  

FUNCTION  

A  SET  OF  METHODS  

FOR  FUNCTIONS  

PROCESS  STEP  

PROCESS  STEP  

PROCESS  STEP  

PROCESS  STEP  

PROCESS  FLOW    

         

PROCESS  STEP  (MP)    with  methods  and  parameters  

         

PROCESS  STEP  (M)  with  methods  decided  

     

METHOD   METHOD  

METHOD  

PROCESS  STEP  

PROCESS  STEP  

PROCESS  STEP  

PROCESS  STEP  

PROCESS  STEP  

METHOD    (with  param.)  

PLANNING   DECIDING  PROCESS FLOW WITH IT SOLUTIONS

PROCESS STEP (IT SOLUTION) with method decision and parameter

definition possibility

PROCESS STEP IT

METHOD MODULE

DEVELOPING IT SYSTEM

METHOD    (with  param.)  

METHOD    (with  param.)  

PROCESS STEP IT

PROCESS STEP IT

PROCESS STEP IT

PROCESS STEP IT

METHOD MODULE

METHOD MODULE

NEEDED  IN  PLANNING:  Process  flow,  Process  steps,  Func-ons  

NEEDED  IN  IT  SOLUTIONS:  Modules  for  Methods,  Parameteriza-ons  for  Modules  

NEEDED  IN  DECIDING:  Methods  for  Func,ons,  Parameteriza,ons  for  Methods  

A SET OF MODULES

FOR METHODS

DESIGNING  EDITING  STRATEGY  BEFORE  PRODUCTION  (ideal  situa1on)  

ONE  PROCESS  STEP  WITH  FUNCTIONS,  MODULES  AND  PRODUCTION  ACTIONS    

15  September  2015   Pauli  Ollila   8  

According  to  the  edi-ng  process  it  tells  what  needs  to  be  done  and  to  be  obtained  as  results.  

Includes  user  ac-ons  within  the  IT  environment:  parameter  defini-ons,  interac-ve  opera-ons,  decisions,  possibly  prepara-ons  and  updates.    

Provides  modules  to  carry  out  the  tasks  given  in  planning  with  chosen  methods.  Supported  by  a  metadata  system  for  user  defini-ons  and  process-­‐based  parameter  produc-on  with  metadata  for  monitoring.  

15  September  2015   Pauli  Ollila   9  

INPUT  METADATA  (needed  for  the  ediAng  process,  i.e.  referenAal  metadata)  -­‐  User-­‐made  metadata  at  the  produc-on  stage  (e.g.  parameter  defini-ons,  rules  for  data  treatment)  -­‐  Imported  metadata  (predefined  metadata  with  e.g.  parameters,  auxiliary  data*,  other  non-­‐sta-s-cal  data*)  

-­‐  Process-­‐made  metadata  (mainly  func-on  indicators  and  derived  variables  for  further  use)  OUTPUT  METADATA  (produced  in  the  ediAng  process,  i.e.  paradata)  -­‐  Func-on  indicator  metadata  (may  be  used  as  subsequent  input  metadata  but  also  informa-on  about  edi-ng  history  of  a  unit)  

-­‐  Process  metrics  metadata  (metrics  describing  the  process  and  its  quality)  -­‐  Process  informa-on  metadata  (what  happens,  what  is  used,  excep-onal  situa-ons,  warnings  …)

METADATA  IN  EDITING  PROCESS  NOTE:  Subclassifica,on  not  from  the  task  team  work,  this  is  only  some  general  descrip,on  (not  ”official”)  

STEERING  THE  PROCESS  mainly  via  metadata  for  parameteriza-on  (e.g.  edit  rules,  limits,  method-­‐dependent  parameters)  with  some  decisions  and  defini-ons  during  the  process.    MONITORING  AND  EVALUATING  THE  PROCESS  mainly  via  output  metadata  of  process  indicators  and  process  informa-on  together  with  experiences  and  special  studies  some-mes.  

Planning,  producAon  and  development  tasks  

15  September  2015   Pauli  Ollila   10  

PLANNING  TASKS  -­‐  Process  flow  crea-on  -­‐  IT  system  for  edi-ng  -­‐  Process  step  crea-on  -­‐  Func-on  crea-on  -­‐  Method  module  prepara-ons  -­‐  Method  selec-ons  -­‐  Parameteriza-on    As  seen  in  previous  slides.  In  many  cases  some  of  these  tasks  aren’t  carried  out  in  a  very  systema,c  way.      

DEVELOPMENT  TASKS  -­‐  Changes  in  process  flow  -­‐  Changes  in  systems  carrying  out  process  flow  -­‐  Changes  in  process  steps  -­‐  Changes  in  func-ons  -­‐  Changes  in  method  modules  -­‐  Changes  in  methods  -­‐  Changes  in  parameteriza-on    Possible  development  tasks  are  based  on  metadata  for  monitoring  and  evalua,on  together  with  experiences  during  the  process  and  special  studies.  A  separate  slide  describes  these  tasks.  

PRODUCTION  TASKS  A.  PREDEFINED  TASKS  -­‐  Data  prepara-ons*  -­‐  Module  prepara-ons*  -­‐  Parameter  defini-ons  -­‐  Submibng  modules  -­‐  Monitoring  process  -­‐  User  decisions*  -­‐  Interac-ve  treatment  B.  UNEXPECTED  TASKS  The  tasks  which  should  be  carried  out  according  to  the  process  flow  are  predefined.  Unexpected  tasks  are  caused  by  varying  problema,c  situa,ons  (separate  slide).  

Problema)c  situa)ons  leading  to  unexpected  tasks  in  produc)on  (examples)

15  September  2015   Pauli  Ollila   11  

STATISTICAL  DATA  

IT  ENVIRONMENTS  INPUT  METADATA  

IT  SYSTEM  &  MODULES  DEFINITIONS  

AUXILIARY  DATA  

EDITING  PROCESS  IN  PRODUCTION  

PROCESS  

•  Insufficiencies or substance changes à extra studies, further data processing, …

•  Method- or process-unsuitable situations in data à reclassification, unit aggregation, method revision, …

•  Insufficiencies and errors in modules / programs à Error checks, programming, …

•  Unsuitable modules for some situations à Improving programs / modules (only if possible in the latter), …

•  Wrong definitions for process or modules à finding out right ways to define, adjusting instructions, …

•  Location, transfer, conversion and/or preparation problems for data sets à various solutions in the corresponding environment

•  Conflicts between statistical and auxiliary data à extra studies, further data processing, …

•  Problems in data sets in different time points à harmonization attempts, …

•  Insufficient or wrong indicator data à changing procedures or definitions behind the indicator data, adjusting the current process …

•  Process not ready for some real situations à making additional modules / programs, …

DEVELOPMENT  TASKS  AT  DIFFERENT  LEVELS  (1) Changes  in  process  flow  •  Rather  excep-onal,  and  usually  -ed  to  large-­‐scale  projects  in  order  to  improve  the  efficiency  of  the  edi-ng  process,  possibly  following  the  idea  of  harmonizing  the  edi-ng  process  among  the  sta-s-cs  in  the  sta-s-cal  office.    

•  The  edi-ng  project  at  Sta-s-cs  Finland  opened  the  possibility  to  revise  the  process  flow  structure  in  some  sta-s-cs,  especially  including  the  selec-ve  edi-ng  process  step  in  it.    

Changes  in  systems  carrying  out  process  steps  •  The  substan-al  changes  in  the  IT  system  are  quite  rare,  but  they  are  conducted  especially  when  there  is  a  need  for  more  systema-c  processing  system  with  all-­‐covering  metadata  structure  and  calcula-on  of  indicators  at  different  levels.  

•  A  SAS  EG  applica-on  called  EG  EDIT  was  constructed  to  u-lize  the  proper-es  of  BANFF  and  SELEKT  packages  with  addi-onal  macro  modules  for  the  needs  in  designing  and  implemen-ng  the  edi-ng  process  of  various  sta-s-cs.  The  package  has  a  metadata  system  collec-ng  necessary  informa-on  about  the  process  and  defini-ons  for  the  process.    

•  The  defini-onal  metadata  for  steering  the  process  (collected  automa-cally  from  the  defini-ons  given  in  EG  EDIT)  is  available  in  full  during  the  process.    The  parameteriza-on  of  the  current  process  is  one  part  of  the  defini-onal  metadata.  One  feature  of  the  process-­‐like  project  form  in  EG  EDIT  is  to  create  process  metrics  automa-cally  when  the  implementa-on  of  the  applica-on  is  going  on.    

15  September  2015   Pauli  Ollila   12  

DEVELOPMENT  TASKS  AT  DIFFERENT  LEVELS  (2) Changes  in  process  steps    •  The  process  steps  can  change  (or  new  steps  can  emerge)  if  there  is  a  need  for  revision  in  some  part  of  the  process  flow.    

•  A  process  step  added  to  the  process  flow  in  some  sta-s-cs  was  the  automa,c  correc,on  of  observed  fatal  errors  with  exact  solu,ons,  mainly  dealing  with  thousand  errors.  A  more  complex  process  step  with  new  func-onal  elements  for  many  sta-s-cs  was  selec,ve  edi,ng.    

Changes  in  func-ons    •  There  may  be  changes  in  func-ons  appearing  in  the  process  steps,  when  some  renewal  is  carried  out  in  parts  of  the  sta-s-cal  process.  An  example  of  this  is  the  inclusion  of  the  score  calcula-on  func-ons  to  the  sta-s-cs  in  the  development  project.    

Changes  in  method  modules    •  The  modules  (procedures,  macros,  program  codes  etc.)  for  methods  are  subject  to  development  especially  in  sta-s-cs  having  non-­‐systema-c  structure  of  the  IT  system.  Some-mes  the  needs  for  new  methods  require  new  module  solu-ons  or  even  tool  packages.    

•  EG  EDIT  has  a  set  of  modules  in  a  constant  basic  structure  and  the  modules  are  steered  with  separate  defini-on  blocks  of  parameters.    

•  It  is  not  self-­‐evident  that  the  modules  will  work  in  all  data  situa-ons  and  defini-ons  (e.g.  Waste  sta-s-cs  with  too  detailed  waste  code  levels).  Some  adjustments  are  then  needed.  

15  September  2015   Pauli  Ollila   13  

DEVELOPMENT  TASKS  AT  DIFFERENT  LEVELS  (3) Changes  in  method  selec-ons    •  The  methods  are  changed  occasionally  in  the  process  steps,  mainly  due  to  some  new  reasoning  or  studies  in  or  when  there  are  new  IT  modules  available  for  more  sophis-cated  methods.    

•  Finding  suitable  types  of  edit  rules  was  an  important  part  in  developing  efficient  error  recogni-on  in  sta-s-cs.  In  addi-on  to  rules  found  based  on  error  studies  on  suspicious  phenomena  in  the  sta-s-cal  data,  “tricks”  for  efficient  error  recogni-on  in  some  cases  were  found.  

Changes  in  parameteriza-on    •  Generally  at  least  some  changes  in  parameteriza-on  are  carried  out  between  different  rounds,  some  of  them  following  the  changes  in  the  substance  area  of  the  sta-s-cs.    

•  Probably  the  most  common  example  is  to  adjust  edit  rules  or  parameters  of  rules  (limits,  condi-ons  etc.)  or  to  make  new  rules.  When  e.g.  using  simple  limits  in  query  edits  or  parameters  for  outlier  recogni-ons,  some  monitoring  of  the  development  in  the  field  to  be  studied  is  considered  to  be  good  prac-ce,  and  there  are  plans  to  build  this  kind  of  mechanism  in  the  process  of  some  sta-s-cs.    

•  The  parameteriza-on  while  developing  the  edi-ng  process  of  the  sta-s-cs  is  rather  frequent,  especially  exact  defini-on  of  the  edit  rules.  This  work  needs  substance  knowledge  of  the  sta-s-c  in  ques-on.  

15  September  2015   Pauli  Ollila   14  

Conclusions  (in  a  bit  straighKorward  manner,  not  official) •  Construct  your  edi-ng  process  flow  well  with  sufficiently  defined  process  steps  containing  edi-ng  func-ons  needed  in  real  sta-s-c  produc-on.  Don’t  leave  “holes”  to  the  process.  Use  GSDEM  sugges-ons  for  help  in  the  task.  

•  Study  methods  available  for  func-ons  together  with  experiences  and  recommenda-ons.  •  Test  the  method  alterna-ves  with  real  data  and  adjust  parameters  to  the  data  situa-on.  •  If  possible,  u-lize  exis-ng  IT  modules/programs/applica-on  parts.  Connect  IT  personnel  to  the  development  of  the  process  and  the  metadata  environment.  

•  Try  to  minimize  programming  and  non-­‐parameterized  data  prepara-on  and  process  work  in  produc-on.  Always  try  to  improve  the  system  in  order  to  avoid  unexpected  tasks  in  the  future.  However,  process-­‐included  interac-ve  treatment  is  not  an  unexpected  task.  

• Monitor  the  whole  process  during  conduc-ng,  and  analyze  the  metadata  describing  the  process  together  with  experiences  for  further  development.  

•  Test  the  methodological  choices  from  -me  to  -me.  

15  September  2015   Pauli  Ollila   15