19.11.2004toni räikkönen data collection in statistics finland now and in the future
TRANSCRIPT
19.11.2004Toni Räikkönen
Data Collection in Statistics Finland
now and in the Future
19.11.2004 2
Topics
General background of the data collection in Statistics Finland
Internet-based data collection Self-made web data collection applications XCola (XML-based Collection Application)
19.11.2004 3
Primary objectives in data collection
reduce data supply burden of respondents speed up data production lower data collection costs improve the quality of data remove overlapping collection and promote joint use of the collected data between different authorities
19.11.2004 4
Background About 96 % of the data is collected from administrative registers
About 4 % of the data is collected directly from respondents
paper forms, Excel sheets web collection applications interviews by CATI/CAPI systems, mainly using Blaise software
Result agreement with the Ministry of FinanceAll respondents (enterprises, communes, schools) should have the possibility to transmit their data electronically by the end of 2006.
19.11.2004 5
19.11.2004 6
Web Data Collection Applications in Statistics Finland 15.11.2004
Inquiry Unit Ready for Production Status Software made by maintenance
1. Self-made web collection applications
Rakennuskustannusindeksi YS 2001 in production VB6 TK/Räikkönen RäikkönenAluebarometri EL 2001 in production VB6 TK/Räikkönen RäikkönenMyyntitiedustelu YS 2002 in production ASP.NET TK/Hedman Esikot / LanuTeollisuuden volyymi-indeksi YS 2002 in production ASP.NET TK/Hedman Esikot / LanuVarastotiedustelu YS 2003 in production ASP.NET TK/Hedman Esikot / LanuOptiotiedustelu YS 2003 in production ASP.NET TK/Hedman Esikot / LanuEnergiankäyttötiedustelu YR 2004 in production ASP.NET TK/Piela Maarit AspTuottajahintaindeksi HP 1.10.2004 in test ASP.NET TK/Asp Maarit AspKuntien toimintayksiköt HE 30.11.2004 in test ASP.NET TK/Piela ?Majoitustilasto YS 1.1.2005 under construction XCola TK/Snellman Kesete ?Teollisuuden uudet tilaukset YS 1.1.2005 under construction XCola TK/Snellman Kesete ?Varallisuustutkimus EL 1.1.2005 under construction Blaise IS TK/Lauri Lauri ?Palvelujen hintaindeksi HP 2005 planned ASP.NET TK/Asp ?Maatilatalouden yritys- ja tulotilasto TO 2005 planned XCola TK/Räikkönen Topso / HelanderTavarankuljetus, kotimaan liikenne YS 2005 planned XCola TK/Kesete ?Tavarankuljetus, ulkomaan liikenne YS 2005 planned XCola TK/Kesete ?Rakennusyritysten korjausrakentaminen YS 2005 planned XCola ?YTR:n yksitoimipaikkaiset YS 2005 planned XCola ? YREKYTR:n laatutiedustelu YS 2006 planned XCola ? YREKYTR:n monitoimipaikkaiset YS 2006 planned XCola ? YREKYTR:n uusien tiedustelu YS 2006 planned XCola ? YREKKulutustutkimus EL 2006 planned Blaise IS TK
19.11.2004 7
3. Other paper form inquiries waiting for a web collection application
Kuntien henkilöstötiedustelu HE 2005 planned Statistics Finland ?Valtion tuottavuustilastointi TO 2005 planned ?Luottokorttitilasto TO 2005 planned ?Maa- ja metsätyöntekijöiden palkat HP 2005 planned ?Yritysten innovaatio YR 2005 planned ?Ammattitiedustelu HE 2006 planned ?Asunto-osakeyhtiöiden taloustilasto HP 2006 planned ?Kuorma-autoliikenteen kustannukset HP 2006 planned ?Kaupan määräaikaisselvitys YR 2007 planned ?Työvoimakustannustutkimus HP 2008 planned ?Pääomakantatiedustelu YR ? planned ?Palvelujen hyödykekysely YR ? planned ?
Inquiry Unit Ready for Production Status Software made by
2. Web collection applications ( have been / being / will be ) built outside of Statistics Finland
Peruskoulut HE 1997 in production ELMAOppilaitokset ja opiskelijat HE 2000 in production ELMALukioiden oppilasvalinnat HE in production ELMALukioiden henk.pohj. opiskelija-ain. HE in production ELMAKuntatiedonkeruu in production ELMAKuntien neljännesvuositilasto TO in production ELMAKuntien toimintatilasto TO 2003 in production ELMAKuntien taloustilasto I TO 2003 in production ELMAKuntien taloustilasto II TO 2003 in production ELMAKuntien palkkatiedustelu HP 2004 in production ELMATeollisuuden toimipaikkatiedustelu T5 YR 2004 in production ELMAAjankäyttötiedustelu YR 2004 in production ELMATietotekn. ja sähk. kauppa yrityksissä YR 2005 under construction ELMAPalvelujen ulkomaankauppa YR 2005 under construction ELMAYritysten tutkimus ja kehittäminen YR 2005 planned, ordered ELMAYmpäristönsuojelumenot YR 2005 planned, ordered ELMATeleviestintä YR 2005 planned, ordered ELMATilinpäätöstietojen lisätiedot TILKES YR 2005 planned ELMA ?Yksityisen sektorin palkat HP 2005 planned ELMA ?Julkisen sektorin tutkimus ja kehittäminen YR 2006 planned ELMA ?Linja-autoliikenne YR 2006 planned ELMA ?Energiantuotanto YR 2006 planned ELMA ?Hyödyketiedustelu YR 2006 planned ELMA ?
19.11.2004 8
Data collection in Statistics Finland by type and media used
Indirect data collection: 94 %EDI: 100%
Direct data collection: 6 %
CAPI: 2%
Paper: 2%
EDI: 2%
19.11.2004 9
Data flows
Different types of data flowsdata are needed only by Statistics Finlandthe same data are needed by several administrative organizations
interviews made by CATI/CAPI system Different solutions
using external teleoperator for distributing data to different data collectors (TYVI model)
self-made web-based systemBlaise solution for carrying out interviews
19.11.2004 10
The TYVI model Data Flows from Enterprises to Authorities
interfaces and transmission data capture data refining management of user accounts
Participants The enterprises The TYVI-operators The authorities
The authority needs not to be in relationship of many to many with the respondents
19.11.2004 11
YrityksetEnterprises TYVIOPERATOR
TYVIOPERATOR
TYVIOPERATOR
TYVIOPERATOR
Enterprises
SOFTWAREHOUSE
ACCOUNTINGOFFICE
Enterprises
TAX
ADMINISTRATION
STATISTICSFINLAND
OTHERS
FTPHTTP
The TYVI-model (Vallaskangas 1998)
19.11.2004Toni Räikkönen
Internet -based collection of data
Case: Building Cost Index
19.11.2004 13
General background
Fall 2000 All existing electronic data collections were handled by 3rd
party operators (TYVI model) The production system of Building Cost Index was under re-
construction and lacked web-based data collection About Building Cost Index (Business Trends)
~300 respondents (hardware stores, wholesale stores, plumbing stores etc.)
Price information of 1-15 products collected from each respondent every month
Paper forms are usually sent on the 15th day and expected back around the 25th day
19.11.2004 14
The design goals of the web system
Provide means of web based collection of statistical data
No extra burden (no installations, no javascript based solutions etc.)
“Live” feedback to the respondents (upon validations etc.)
19.11.2004 15
Hardware architecture
Running on Windows NT serverWeb server: Microsoft Internet Information Server 4 (IIS4)
Component Server: Microsoft Transaction Server 2.0Anonymous access (No NT-authentication)
Database serverWindows 2000 serverRunning Microsoft SQL Server 2000Deployed on DMZ, accessible only through firewall
19.11.2004 16
Application architecture
Built using Microsoft Windows DNA (Distributed iNternet Application Architecture)
Standard 3-tier architecture that consists ofPresentation layer: HTML, ASPBusiness layer: COM componentsDatabase layer: Relational database
System consists of two separate modules (both self-made)
User authenticationData collection
19.11.2004 17
Experiences
Beta phase from 5/2001 - 9/2001, 30 respondents 9/2001 - 2/2002, 70 users In 3/2002 the systems was opened to all respondents
147 users at the moment (nearly 50%)
19.11.2004Toni Räikkönen
Internet -based collection of data
CASE: Business Trends’ collection systems
technical aspects
19.11.2004 19
Design goals
Create framework for similar systems Multi-language support LDAP -based user authentication w/ centralized administration
Create generic method for transferring data between collection and production databases
Create “mass emailer” for all kinds of collection systems
19.11.2004 20
Software & hardware architecture
Built using Microsoft.NET and ASP.NET Generic 3-tier architecture w/ presentation, business and database logic
Collection database separated from the production database
128 bit encryption used for communication between respondents and Statistics Finland
19.11.2004 21
Framework of the collection system
The modular structure of the framework allows toChange menus, headers, footers and other stylesAdd custom functionality (using ASP.NET user controls) on the pages
Add and load different languages for the pages The base use cases are more or less same in different collection systems (login, questionnaire, feedback, instructions and contact information)
19.11.2004 22
Multi-language support
Most of the textual information on the web pages is stored in the database
Texts are loaded on the server’s memory on the system startup
Only long descriptions are kept as files Page language can be changed “on the fly” Every element has a tag on the page template and the relevant text is attached to the element upon the page load
19.11.2004 23
User authentication
The objective was to use LDAP (lightweight directory access protocol) for the user authentication
The development for this didn’t proceed in the schedule, so it was temporarily replaced with database-based user authentication and administration
Authentication thru LDAP has been tested and it seems to be an ideal solution
At the moment we’re building a simple web administration application to finish the LDAP part
19.11.2004 24
19.11.2004 25
Data transfers
Data transfers between collection and production databases are handled with an external win32 -application
Built with PowerBuilder using pipeline feature (data flow)
Data from collection database is transferred to the temporary tables in the production database and then synchronized with the actual tables
Solution is quite customizable, allowing new functionality by adding new pipelines
19.11.2004 26
Mass emailer
An external application was built with Visual Basic 6 to send emails to the respondents
Modular approachNew systems can be added using textual configuration files
Reply requests can be added by writing sql statements to the configuration files
Supports attachments Replaces traditional letters
19.11.2004 27
Development experiences
Microsoft.NET was just released when the development began
Development environment wasn’t always stabile and the developers experienced quite a lot of unexpected behavior
Despite this, ASP.NET is quite an improvement when comparing to other web application methods (asp, php, perl etc.)
Although inter-browser compatibility is still quite poor
19.11.2004 28
Effects of the electronic data supply system on data collection process
Printing the questionaries Transferring data to collection database Mailing E-mail informing (mass emailer) Receiving the questionaries (mail, fax, e-mail, TYVI) (Electronic data supply) Validating and entering the data Mass validation Printing and mailing the reminders E-mail reminder (mass emailer) Phone inquiry Phone inquiry Non-individual delayed feedback Individual direct feedback Limited access to previous own data Previous own data available
Manual exclusive treatment Electronic mass treatment
19.11.2004 29
Results (1): Sale inquiry
Electronic data supply system users of all
respondents:
after 1. month: 48%
after 2. month: 59%
after 3. month: 61%
since 4. month: 70%
Today: 75 - 80%
19.11.2004 30
Results (2): Sale inquiry
Reminders sent:
before electronic data supply system: ~1000
after 1. month: ~800
after 2. month: ~700
after 3. month: ~600
since 4. month: ~500
19.11.2004 31
Experiences (1)
Feedback from respondents has been very positive:
Response burden has redused remarkably
Enthusiasm of persons involved in data collection
Manual data treatment has redused (at least by 50%)
Quality of data has improved: Validation, additional
information if data is not comparable etc.
19.11.2004 32
Experiences (2)
Number of enquires made by respondents concerning
electronic data supply system:
first two months: ~100 / month (mainly questions
concernig base settings)
since third month: ~30 / month (mainly forgotten
passwords)
19.11.2004 33
Development ideas
Although the framework is quite good, some ideas have arisen
Use of XML toDefine the concepts of the questionnairesDefine the presentation (XSLT)Define the validations
Replace the user authentication with LDAP
19.11.2004 34
Benefits
EnablesComplex validations of the dataDynamic creation of presentation layer logicDisplaying of pre-fetched data to individual respondents
Live feedback to the respondents (validation errors etc.)
19.11.2004 35
Drawbacks
Requires user/customer administration forMaintaining user profilesHelpdesk/Support services
19.11.2004Toni Räikkönen
Internet -based collection of data
CASE: Accomodation statistics
XML-based form
19.11.2004 37