the wiserd geoportal: a tool for the discovery, analysis and visualization of socio-economic (meta-)...

20
Research Article The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales Richard Fry WISERD, GIS Research Centre University of Glamorgan Robert Berry WISERD, GIS Research Centre University of Glamorgan Gary Higgs WISERD, GIS Research Centre University of Glamorgan Scott Orford WISERD, School of City and Regional Planning, Cardiff University Sam Jones WISERD, Cardiff University Abstract The Wales Institute of Socio-Economic Research, Data and Methods (WISERD) is an interdisciplinary, cross-institutional academic research group based in Wales, UK. One of the key objectives of WISERD is to develop a spatial framework that enhances a researcher’s ability to discover socio-economic research data relating to Wales with the aim of encouraging collaborative research and re-use of existing data. This article describes the development of an online geoportal designed to meet this objective. Using free and open-source software (FOSS) components and services, a range of software tools has been developed to capture standards-compliant metadata for a variety of data sources. The geoportal is unique in that, in our review of over 120 geoportals worldwide, we have not previously encountered a geoportal dedicated to supporting quantitative and qualitative social science academic and policy research. A particularly innovative aspect of the geoportal has involved the building of a rich meta-database of government surveys, geo-referenced semantically-tagged qualitative data (generated from primary research), ‘grey’ data (e.g. from transcripts, journal publications, books, PhD theses) and Government administrative data. This article Address for correspondence: Richard Fry, Centre for Health Information, Research and Evaluation (CHIRAL), College of Medicine, Floor 3, Institute of Life Science 2 (ILS2), Swansea University, Singleton Park, Swansea, UK SA2 8PP. E-mail: [email protected] Transactions in GIS, 2012, 16(2): 105–124 © 2012 Blackwell Publishing Ltd doi: 10.1111/j.1467-9671.2012.01308.x

Upload: richard-fry

Post on 30-Sep-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

Research Article

The WISERD Geoportal: A Tool for theDiscovery, Analysis and Visualization ofSocio-economic (Meta-) Data for Wales

Richard FryWISERD, GIS Research CentreUniversity of Glamorgan

Robert BerryWISERD, GIS Research CentreUniversity of Glamorgan

Gary HiggsWISERD, GIS Research CentreUniversity of Glamorgan

Scott OrfordWISERD, School of City andRegional Planning, CardiffUniversity

Sam JonesWISERD, Cardiff University

AbstractThe Wales Institute of Socio-Economic Research, Data and Methods (WISERD) is aninterdisciplinary, cross-institutional academic research group based in Wales, UK.One of the key objectives of WISERD is to develop a spatial framework that enhancesa researcher’s ability to discover socio-economic research data relating to Wales withthe aim of encouraging collaborative research and re-use of existing data. This articledescribes the development of an online geoportal designed to meet this objective.Using free and open-source software (FOSS) components and services, a range ofsoftware tools has been developed to capture standards-compliant metadata for avariety of data sources. The geoportal is unique in that, in our review of over 120geoportals worldwide, we have not previously encountered a geoportal dedicated tosupporting quantitative and qualitative social science academic and policy research. Aparticularly innovative aspect of the geoportal has involved the building of a richmeta-database of government surveys, geo-referenced semantically-tagged qualitativedata (generated from primary research), ‘grey’ data (e.g. from transcripts, journalpublications, books, PhD theses) and Government administrative data. This article

Address for correspondence: Richard Fry, Centre for Health Information, Research and Evaluation(CHIRAL), College of Medicine, Floor 3, Institute of Life Science 2 (ILS2), Swansea University,Singleton Park, Swansea, UK SA2 8PP. E-mail: [email protected]

Transactions in GIS, 2012, 16(2): 105–124

© 2012 Blackwell Publishing Ltddoi: 10.1111/j.1467-9671.2012.01308.x

Page 2: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

describes the challenges faced during the development of the WISERD Geoportalwhich can be accessed via http://www.wiserd.ac.uk/geoportal/.

1 Introduction

The Wales Institute of Social and Economic Research, Data and Methods (WISERD) isa networked research institute of five universities in Wales which was established inOctober 2008. The overarching aims of WISERD are to enhance the quality and quantityof social science research in Wales, to develop the social science research infrastructure,and to encourage collaborative research between academics and policy makers. Animportant element of WISERD, and the focus of this article, is the development of asocial science geoportal as one approach to addressing these aims. The geoportal facili-tates the discovery, analysis and visualization of different types of social science data, andassociated meta-data, as a means of enhancing empirically informed social scienceresearch and encouraging collaboration between individuals and groups. It has beencreated specifically for researchers who may be unfamiliar with geospatial technologiesbut who need to locate curated data sources relating to a particular theme (e.g. health,crime, economy, transport, etc.) and/or for a particular locality. In a time of tight fiscalbudgets and funding constraints, the geoportal has been conceived, designed and devel-oped in the spirit of the repurposing and re-use of existing data sources as endorsed bythe UK government in their National Data Strategy (2009) and current U.S. federalpolicies that promote secondary use of federally funded data and mandate data sharingwithin the research community (Trinidad et al. 2011). Crucially, it also recognizes thevaried and disparate nature of social science data. There is a diverse range of data sourcesavailable to social scientists including: conventional government social science surveys,administrative data such as school records and public spending records, unofficial ‘grey’data such as bespoke local authority surveys, data collected as part of academic research,and documents such as PhD dissertations and research reports. These social science datasets are often maintained and stored across multiple repositories, with no linking mecha-nisms and tools, and at different geographic scales, from UK-wide down to postcodelevel. This means that even experienced data-users can find it time-consuming anddifficult to navigate the various data repositories and locate useful and reliable datawhich may be relevant to their research interests. This seriously restricts the discoveryand re-use of data, and increases the risk of duplicative data collections, wasting publicresources and increasing respondent burden. In this article, we discuss the substantiveinroads that have been made into the handling and dissemination of social sciencemetadata. Thus, a particularly innovative aspect of the geoportal has involved thebuilding of a rich standards compliant meta-database of government surveys, geo-referenced semantically tagged interview transcript data, ‘grey’ data and routinely col-lected administrative data.

A fundamental aspect of the geoportal is that it has been developed using OpenSource rather than proprietary software. There are four main reasons for this; firstly thedisparate nature of our data sources meant that no single proprietary system was capableof capturing, encoding, retrieving, displaying and disseminating the data in a way thatwas suitable to address our objectives. Rather, the use of open code software permitteda high level of flexibility and customization during development, which allowed us thenecessary scope for experimentation. Proprietary software is generally less flexible in this

106 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 3: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

regard. Secondly, on long-term projects there is the freedom to change and share any ofthe components of the software at any stage without the pressure from software vendorcontracts or financial concerns surrounding licensing that may restrict any decision-making. Thirdly, there were the significant cost reductions associated with using free andOpen Source software (FOSS) to develop the geoportal over proprietary software solu-tions, which was an important consideration in terms of budget constraints. Finally, animportant factor in adopting FOSS was the real opportunity to contribute to the knowl-edge in the area by developing a novel type of geoportal based on innovative use ofGeoFOSS components. Hence, the types of tools developed here are transferable to othergeographic areas and are capable of being enhanced by other groups of researchers; forexample to include a wider range of data and applications. In addition, a recentannouncement by the UK government, outlining an action plan to promote the use ofFOSS at all levels of government, has added extra impetus and credibility to the OpenSource movement in Britain (CIOC 2009) and provides a strong policy and politicalrationale to our approach.

In this aricle we build on a previous article which gave a more detailed backgroundto the aims of the WISERD project as a whole and provided a conceptual design of thesystem (Berry et al. 2010), to describe the actual implementation of the geoportal. Theemphasis here therefore is on the technical aspects of the WISERD Geoportal (WGP) aswell as an outline of the current (and projected) functionality. The remainder of thearticle is structured as follows: in the next section, we provide a brief summary of theexisting literature on the developments of geoportals before drawing on a survey ofexisting approaches to provide an overview of the main aims and principles behind thedevelopment of the WGP; and in the third section, we describe the main functionality ofthe system in terms of the metadata generation tools and the WGP technology stack andprovide practical examples of the use of these tools using social and economic data forWales. We propose further developments of the WGP in the fourth section, beforeproviding conclusions arising from this research in the final section.

2 Background

2.1 Geoportals

There has been some debate regarding the exact definition of the term geoportal andalternatives such as geodata portals, web-based GIS portals or geospatial portals, forexample, seem to be used interchangeably in the literature Maguire and Longley (2005).Armstrong et al. (2011; p. 114) define a geoportal as “a type of web portal used to findand access geographic information (geospatial information) and associated geographicservices (e.g. display, editing and analysis) via the Internet”. They suggest one of the keybenefits of such geoportals relates to the easier interactive access to such services and thusthe widening of the potential user-base to “form a key element of the emerging spatialdata infrastructure”. These include not only web-based viewers to support online accessto location-based information but also the increasing availability of metadata-basedsearch and discovery tools. Such portals should ideally permit access not only to data andmetadata relating to government or national surveys but also to data collected in a rangeof research projects or reported in academic publications. Many of these geoportals areeffectively front ends to spatial data infrastructures permitting various degrees of access

The WISERD Geoportal 107

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 4: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

to data at local, regional and national levels or in the case of the European Union’sINSPIRE geoportal (http://www.inspire-geoportal.eu/) at the pan-national level (seeMasser (2011) for a recent overview). In addition, some geoportals have been developedwith narrower thematic remits in mind; for example for the management of subsurfacedata in the Netherlands (Lance et al. 2011) or to access data on risk to natural hazardsat the global scale (Giuliani and Peduzzi 2011). However, in our review of over 120geoportals world-wide, we have not previously encountered a geoportal dedicated tosupporting quantitative and qualitative social science academic and policy research.

As posited by Harris et al. (2010, p. 133) “the development of mechanisms forfacilitating the discovery, location and connection to that data is now a primary focus ofresearch”. Some of the earlier research on the use of web-based tools has been docu-mented for example by Goodchild et al. (2007), in relation to the Geospatial One-Stop(GOS) web portal in the U.S., and by Beaumont et al. (2005) for examples from the UK.Many of the design issues will be common to other Internet portals developed as part ofspatial data infrastructures permitting access to geographic information (e.g. Tait 2005,Tang and Selwood 2005, Nyerges et al. 2007, Yang et al. 2007). Typically the function-ality will consist of core modules: mapping, graphing, data download (including accessto metadata), data extraction and web services (Giuliani and Peduzzi 2011). There are anincreasing number of proprietary geoportal solutions being made available; however, theprincipal aim of the project reported here has been to use Open Source software to createa (meta-) data discovery, analysis and visualization tool. The Open Geospatial Consor-tium (2004, p. 1) highlights four classes of service that are needed to “procure acomprehensive geospatial portal implementation” and these have directed our owngeoportal development; namely portal services (to provide a single access point andmanagement/administration), catalog services, portrayal services and data services. Tech-nical issues around the design of the WGP architecture are highlighted in the remainderof the article.

2.2 Overview of the WISERD Geoportal

The WISERD Geoportal is an innovative GIS-based web application designed toenhance the discovery, integration, visualization and retrieval of social science researchdata in Wales. Fundamental to data discovery, integration and retrieval is consistent anddetailed metadata compliant to national and international standards (AGI 2009) andhence the WGP has been concerned with metadata development. Implementing theWGP has involved the development of a suite of bespoke software tools consisting oftwo metadata capture tools (one for quantitative survey and administrative data, andone for qualitative and ‘grey’ data), an archiving system (to archive primary WISERDresearch) and a cartographic web-based end-user interface for interrogating the meta-databases. This permits potential users not only to know about the existence of, andsource for, a specific data set but also to determine the provenance of the data, enablingthem to make judgements on its fitness for use. Each of the tools has been developedusing a number of FOSS software components, designed to capture, manage andpublish the metadata with internationally recognized metadata standards such asDublin Core, DDI 3.0, ISO19115, UK GEMINI 2/INSPIRE. In this article, we focusspecifically on the development and functionality of the online geoportal itself and itsmapping/search interface, which has a number of tools that enable the end users tosearch, examine and visualize metadata from numerous metadata repositories held by

108 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 5: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

WISERD. These repositories include question-level socio-economic survey metadata,qualitative interview transcript record metadata, ‘grey’ data, administrative data andlinked data available from external sources on the Internet (e.g. www.data.gov.uk).

The WGP is built around a cartographic interface, which enables the end-user toperform keyword, spatial or spatial-plus-keyword searches through a variety of differentsearch tools. These custom tools are in addition to traditional GIS functions such as‘identify features’, ‘measure area’, ‘measure length’, ‘pan’ and ‘zoom’, etc. The function-ality of the custom search tools are described in more detail later in the article. Otherfeatures of the user-interface include a table of contents (showing search results and maplayers), a user account area, print and save functions, annotation tools, help, and FAQsections. Figure 1 shows the current user interface as presented to the end user when theyfirst log-on to the WGP.

3 Methods

3.1 Metadata Generation

In order to function effectively, the WGP relies on the quality and consistency of theunderlying meta-databases. The metadata offers key information about each resource,including details about how users can access the resource to which the metadata refers.

Figure 1 The WISERD Geoportal User Interface

The WISERD Geoportal 109

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 6: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

Table 1 summarizes the different categories of metadata stored in the meta-databases. Abrief description of the software tools and metadata capture processes is thereforeincluded in this article in order to describe these important elements that underpin thefunctionalities of the WGP, discussed in more detail later in the article.

The metadata generation process is split into two areas – structured quantitativesurvey and administrative data, and semi-structured qualitative and ‘grey’ data. Toenable the generation of the metadata for both these types of data, two bespoke piecesof software have been developed. Firstly, a desktop-based manual data entry program hasbeen created that ensures users enter metadata for quantitative survey and administrativedata that is compliant with the relevant international metadata standards. This softwaretool allows the data entry team to break down survey questionnaires and generatequestion- and response-level metadata for a number of different types of surveys. It alsoallows the generation of metadata for administrative data such as education data. Thetool generates four separate metadata database tables for each survey. The first of thesemetadata tables contains Dublin Core metadata. Dublin Core metadata exists for everyresource accessible through the WGP, regardless of data type, and contains informationabout the metadata itself, such as the generation date and the generating organization.The second metadata table contains metadata about the survey, including informationsuch as survey collection dates, survey methods, sample sizes, response rates and,importantly, information about accessing the source data itself. The final two metadatatables divide each question in the survey using a ‘Question’ table and a ‘Response’ table.Question table metadata holds details such as the question type, question text, andthematic tags that allow questions to be linked to one of the WISERD research themes.Response metadata records hold details such as response types, response text, responsetables, and any checks, computed variables and question routing.

The second metadata generation software tool is a web-based qualitative metadatageneration tool. This tool currently uses two secure web-services to aid the generation ofmetadata for textual qualitative data such as interview transcripts and ‘grey’ data such asjournal articles. When users upload text to the program, the data is first sent to theOpenCalais web service (http://www.opencalais.com/) which identifies rich semanticmetadata such as industry terms, organizations, books, and geographies (OpenCalais

Table 1 Summary of the contents of the meta-databases (as of December 2011)

Data Type Data Source Statistics GeoSpatial Data Source

Quantitative Surveys (Governmentand Academic)

96 Surveys (9,760questions)

Official UK administrativeboundary data

Administrative data 7 Years of EducationData

Official UK administrativeboundary data

Qualitative WISERD interviewtranscripts

47 InterviewTranscripts

Gazzeteer basedgeo-referenced locations(x,y coordinates)

‘Grey’ data 9 PhD Dissertations Gazzeteer basedgeo-referenced locations(x,y coordinates)

110 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 7: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

2011) using natural language processing, machine learning and other text-analysingtechniques. This enables the data entry team to create a description of the documentusing a series of semantic tags that relate to its content without impinging on privacy orcopyright issues associated with confidential research interviews. The second web serviceaccessed by the qualitative metadata generation tool is the EDINA Unlock web service(http://unlock.edina.ac.uk/), which is used to identify place names within the text, allow-ing the document to be geo-referenced, and spatial metadata generated. EDINA Unlockuses similar natural language and text processing techniques to OpenCalais for identi-fying place names within a text. The web service then cross-references these place nameswith a number of worldwide gazetteers in order to provide a ranked ‘best-guess’ of theactual location of the place names identified in the document.

3.2 Technology Stack

The basic systems and process architecture for the WGP is illustrated in Figure 2. Thedevelopment hardware currently consists of a web server (Windows Server 2008 OSrunning IIS 7.5 and GeoServer 2.1) and a database server (running Ubuntu Server 10.04and PostgreSQL 8.4 with PostGIS 1.5). The web server has two primary roles: the firstis to host the WGP and the second is to act as a secure gateway to the database whengenerating the metadata remotely. The dedicated database server performs the bulk of thedata and geospatial processing as queries are performed on the WGP or data are storedthrough the metadata generation process. The WGP technology stack is comprisedentirely of Open Source software components ranging from Microsoft’s ASP.NETModel-View-Controller (MVC) (http://www.asp.net/mvc) architecture (Microsoft PublicLicense (MS-PL)) for server-side programming elements to ExtJS/GeoExt (GNU GPLlicense v3) for the user interface. This approach has enabled us to adopt a flexible

Figure 2 WISERD Geoportal architecture

The WISERD Geoportal 111

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 8: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

development process in terms of identifying the best software and libraries to develop theWGP. The technology stack currently used for the WGP is shown in Figure 3. There arenumerous fully functional Open Source databases currently available (for examplePostgreSQL, MySQL, SQLite, Firebird) for application development. However, thespatial data handling and functionality varies across these database systems. Of theleading Open Source database systems, PostgreSQL, MySQL and SQLite have activespatial development strands. Obe and Hsu (2011) have identified PostGIS for Postgr-eSQL as having more spatial functionality and complexity when compared to MySQLSpatial and SpatiaLite (SQLite). Coupled with enterprise functionality and scalability ofthe underlying database system (PostgreSQL), PostGIS (http://postgis.refractions.net/)has proved to be the ideal spatial database solution for the WGP. PostGIS provides anumber of spatial analysis functions and storage capabilities that allow for the creation,update, querying and delivery of the metadata and associated geographies to the end uservia the WGP interface. These include functions that allow intersection analysis, gener-alization and simplification, buffer analysis, spatial indexing and coordinate transforma-tions of the metadata generated using the metadata tools described previously. PostGISalso allows the output of data in ‘web-friendly’ formats such as Geographic JavaScriptObject Notation (GeoJSON) and Well Known Text (WKT) for consumption by Java-Script libraries such as OpenLayers and GeoExt. This functionality allows the end userto perform complex spatial and text based searches resulting in the ability to viewmetadata and geographies relating to the search criteria. The specific tools relating tothese search capabilities are described in more detail later.

In addition to the spatially-referenced quantitative and qualitative research dataresiding on the database server, there was also a need to serve ‘standard’ GIS data to theweb mapping interface in the form of administrative/special designation boundaries. Thisdata, loaded into the WGP Table of Contents on start-up is useful for potential WGP usersas it can be used to overlay any other map layers retrieved from the database as a result ofa text or spatial search. In order to minimize the workload of the database server andincrease the speed of the application, this data is served to the WGP using a geospatialsoftware server, namely Geoserver (http://geoserver.org). Geoserver is arguably the mostmature of the current crop of Open Source geospatial data server software tools and isgood for efficiently publishing standards-compliant spatial data to the Web. The maincomponents of the WGP’s graphical user interface (GUI) itself were developed using acombination of OpenLayers (http://openlayers.org/), GeoExt (http://geoext.org) andExtJS client-side JavaScript-based Open Source software libraries. OpenLayers is a keycomponent of GeoExt and its function is to manage and display the map data, comprisingbase-mapping (currently provided by the Ordnance Survey (OS OpenData™)); spatialdata from the meta-database (as WKT/GeoJSON), and boundary data from a webmapping service (WMS). ExtJS and other GeoExt components are then used to create themap controls and other elements of the interface for user interaction such as the table ofcontents and the search, help and user account controls. Textual metadata and dataretrieved from the database are displayed using ExtJS windows and forms, which are alsoused to control input parameters for map layer display and metadata reporting andanalysis. In addition to the benefits that these cutting edge JavaScript libraries bring tostreamlining the development of the various GUI tools and components, the developmentteam are able to draw on the styling and functionality of the modern UI ‘widgets’ (datagrids, windows, toolbars, menus, etc.) within these libraries to build an application that isvisually more akin to desktop software GUI than a ‘traditional’ website interface.

112 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 9: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

Figure 3 WISERD GeoPortal Technology Stack

The WISERD Geoportal 113

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 10: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

The ASP.NET MVC was chosen as the web framework for the WGP as it allowed thedata integration team to use existing skills to employ a MVC architecture without theneed to learn a new programming language, which enabled a faster and more efficientdevelopment process. The three components that make up a MVC architecture – themodel, view and controller – can be simply defined as follows. The model controls theapplication logic and processes user inputs; the view renders a user interface so that anend user can interact with the model and the controller receives the user input, initiatesthe model, and delivers a response to the view based on the results from the model. TheMVC architecture allows the isolation of different aspects of the WGP and thereforeallowed for independent development and testing of the user interface and domain logicamongst the data integration team.

3.3 Functions

3.3.1 Metadata discovery

One of the primary aims of the WGP is the discovery of socio-economic research data forWales, via rich meta-databases of quantitative and qualitative data. The WGP GUIcurrently allows users to interrogate these meta-databases interactively, in two mainways: text-based keyword querying, and spatial querying. The text-based search functionuses PostgreSQL text searching functionality to search for keywords in the titles andquestions of the survey meta-database and matches words in the qualitative meta-database. Though currently restricted to keyword-only database querying, a moreadvanced text search tool is currently under development that will allow users to furtherrefine queries using numerous input parameters (time/date, WISERD research themes,data type, etc.), string searches and Boolean operators. When a text search has beencompleted, the results are returned to the user via JavaScript Object Notation (JSON)and displayed in a (ExtJS) tabbed window (see Figure 4). The WGP has a number of GISfunctions that enable a user to discover metadata via a spatial search. A place-namesearch allows users to enter a location as a text string and specify a circular bufferdistance around that point in which to conduct the search. In order to counter user error,the place-name text entry box connects to EDINA Unlock’s open data gazetteer of worldplace names (http://unlock.edina.ac.uk/places.html), from which the user must select aknown location using predictive text. A second buffer function works in a similarfashion, but rather than using a place-name to derive the origin of the buffer, the toolallows a user to drop a place marker anywhere on the map, from which a buffer distancecan be specified. The buffer tools use the ST_Within function in PostGIS to retrieve anydata within the search area that has been geo-referenced. Additional spatial search toolsthat are currently under development will enable users to perform spatial searches usingadministrative boundary data, or by drawing rectangular or irregular polygons on themap. Figure 5 shows an example of a spatial search being performed in the WGP usingthe place name-buffer function. As it is spatial data that is retrieved from the database,unlike a text search, the results of the query are displayed as map layers in the WGP’stable of contents (Figure 5), where a user can right-click any layer to view its metadatain more detail (see also Metadata Viewing and Analysis in this section). Retrieved maplayers are organized by data type (survey data, qualitative data, admin data and ‘grey’data) in the table of contents.

114 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 11: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

Survey data is displayed in the map using boundary area polygons, while thequalitative metadata is mapped as points based on the place names extracted by theautomated metadata generation software.

3.3.2 Metadata viewing and analysis

Once data has been retrieved from the database via a keyword or spatial database query,a number of tools and components have been developed that allow a WGP user toexplore and analyse the returned metadata. When a text search is performed, the resultsare returned to the end user as an ExtJS window which displays the query results anddivides them by data type (survey data, qualitative data, ‘grey’ data and administrativedata) using tabbed panels (see Figure 4 – ‘Results Categories’ caption). If a user discoversa dataset that is of interest, options exist to view the full metadata record (this form canalso be accessed for map data by right-clicking on the layer in the map table of contents)or view the spatial extent of the data by adding it as a layer to the map table of contents.The ExtJS forms showing the full metadata records for survey/administrative data andqualitative data are shown in Figures 6 and 7, respectively.

Using the survey metadata form (Figure 6), users are able to explore the variouslevels of metadata available for the chosen survey (Dublin Core, survey, question andresponse), view any associated survey response tables in a separate window or view other

Figure 4 Results returned from the database following a keyword search

The WISERD Geoportal 115

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 12: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

questions in the same survey. Interactive graphing tools can also be accessed from thisform, which enable a user to analyse and visualize the survey response statistics for anygeographical/administrative areas where response data is available (Figure 8). This caninform the user on the sample sizes of responses to survey data questions for geographicalareas of interest and whether these are large enough for quantitative analysis. Further-more, it also allows users to compare the sample sizes of responses to questions indifferent surveys and, if the questions and response categories are similar enough,whether data from different surveys can be ‘pooled’ to increase the sample size ofresponses to allow robust quantitative analysis. The qualitative metadata form (Figure 7)also has additional tools that enable further value to be extracted from the data. If a userwants to find out more about the content of a particular research interview, for example,they are able to view a word/tag ‘cloud’ (Figure 9) of the keywords within the interviewtext, where the relative prominence of each term in the document is indicated by the sizeand colour of the text font. Another function that has been developed for discoveringmore about qualitative interview data, without infringing on copyright or revealing anypotential disclosure material, is the qualitative data geography analysis tool. This toolcan be used to show the frequency and location of place names within the sourcedocument, without revealing the source text itself (Figure 10). These place names canalso be added to the map as proportional point symbols. This can help to provide a

Figure 5 Performing a spatial search with a user-defined buffer (in this example the userhas opted to display the spatial extent of a survey returned by the query)

116 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 13: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

researcher with an indication of how relevant the research or interview might be for theirparticular study area. All metadata reports and analysis can then be saved to a WGPuser’s personal account and/or exported to PDF.

4 Discussion and Future Developments

The WGP, still in its alpha phase of development, requires immediate work in a few keyareas before a beta version is released for general user testing in January 2012. The needto enhance the performance of dynamically generating spatial data from database queriesis particularly important for improving the user experience. As geospatial software suchas GeoServer has not supported, until recently, the generation of dynamic SQL layers, asolution has been developed whereby attribute data for survey and administrative meta-data is linked to spatial boundary data on the fly within the PostGIS database. The linked

Figure 6 Full metadata record for survey/administrative data

The WISERD Geoportal 117

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 14: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

tables, geographies and attributes are then exported from the database using a combi-nation of WKT for the spatial data and JSON for the attribute data, the whole packagethen being delivered to the browser in a compressed JSON package. In the browser,OpenLayers then renders the WKT geographies as a vector data layer with the attachedattributes extracted from the JSON file used to populate the ExtJS metadata forms. Asthe size of the database is due to increase substantially in the future as more metadata isadded, improvements in the performance of database queries must be sought. Forexample, a typical spatial search using a 5 km buffer typically takes 20 seconds or moreto calculate due to large amounts of spatial data being returned as part of the query(~100 Mb). Improvements will be achieved by re-working the process of dynamicallygenerating spatial data from the database (drawing upon the new SQL and long URLfunctionality in GeoServer), developing more effective database indexing strategies, andinvesting in more powerful database server hardware. This will reduce the load on theclient browser when rendering the spatial data and reduce the need to return large spatial

Figure 7 Full metadata record for qualitative data

118 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 15: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

datasets from the database as part of the query. The flexibility of using Open Sourcesoftware has allowed us to investigate and develop new methodologies as required –something we would have found more difficult to do using proprietary systems.

The use of detailed area boundary data files in spatial queries means that file sizes forgeographical data are large, resulting in the need for such data to be generalized. Whilegeneralizing spatial data using PostGIS can result in file size reductions of up to 97% (e.g.from 1.7 MB to 47.6 KB) for delivery to the browser, with a further 50% reduction insize possible by employing JSON compression (e.g. down to 24 KB), problems arise withmaintaining the topology of the original data. The generalization function (ST_Simplify)in the current version on PostGIS fails to maintain the correct topology of complex datathat is heavily simplified, resulting in large holes between polygons displaying when thedata is rendered on the map. Work on resolving this issue is underway, althoughindications are that full topology support included in PostGIS 2.0 and/or the use of new

Figure 8 Interactive charting tools allow a user to analyse survey responses bygeography

The WISERD Geoportal 119

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 16: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

GeoServer functionality (long URL’s and SQL Views), will go some way to resolving theissue. Another facet of the WGP that requires improvement before a full beta version isreleased is the quality of the cartography, particularly the labelling in OpenLayers whichneeds further development in order to produce more aesthetically pleasing maps. Recentresearch has drawn attention to the importance of map presentation – and in particular,semantics, geometries, levels of detail, labels and symbols – for view services in geopor-tals (Harrie et al. 2011). The development team are also working on functionality todisplay the attribute response tables for surveys so that when a user moves a cursor overthe dynamically generated boundary polygons on the map, the attribute data will bedisplayed. More generally, improvements are being made in several areas with regard torendering both quantitative and qualitative metadata search results on the map.

A beta version of the WGP is scheduled for user testing in January 2012, with aversion 1.0 release version planned for delivery in the first quarter of 2012. User testingwill be conducted using a large pool of testers, including associate members of WISERDsuch as external researchers and government agency officials as well as WISERDresearchers more closely associated with the project. Such evaluation will draw upon thelatest research in usability evaluation for geospatial technologies (Haklay and Zafiri2008, Marsh and Haklay 2010) and take on board some of empirical approachesdocumented for evaluation practice highlighted for example by Lance et al. (2011). TheWGP will continue to be refined and developed, incorporating any feedback/issues from

Figure 9 Tag cloud generated from research interview metadata

120 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 17: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

the beta testing, until its version 1.0 release with further development of the applicationpossible beyond 2012, subject to funding.

One of the most important future development tasks will be to expand the WISERDmeta-database and incorporate new types of data such as environmental data andpublicly available linked data published on the Internet in addition to adding moremetadata for existing data categories. We shall also explore the possibilities of addingcommercial and tracking data. The speed of database querying also needs to beimproved, particularly for the more computationally intensive spatial searches, andinvestment in database server hardware is expected to improve matters in this respect.Furthermore, in addition to the current options, functionality will be developed that willallow a user to rank search results by relevance and/or date, and work will continue onthe data analysis and visualization tools in order to extract maximum value from theavailable metadata. An interactive help system is also planned for development, and theWGP will eventually support all web browsers (it is currently optimized for Mozilla

Figure 10 Qualitative metadata geography analysis tool

The WISERD Geoportal 121

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 18: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

Firefox). De Longueville (2010, p. 301) presents some ideas on next generation geoportaldevelopments as open systems “that supports the discovery, exchange, advertisement anddelivery of geospatial resources on the Web”. Furthermore, he suggests that geoportalscan act as a bridge between spatial data infrastructures and the geospatial Web 2.0. Inour future research, we will be exploring such possibilities so that future developmentsmay also include the building of community features into the WGP such as blogs and aforum, where researchers can exchange information and ideas on research and researchdata, with the aim of facilitating more collaborative working. In addition, a long-termgoal is to shift the ‘power’ towards the user in terms of metadata generation byincorporating the metadata entry software applications, highlighted in Section 3 of thearticle, into the WGP, giving users the ability to upload their own metadata to thedatabase. Where feasible, the download of source data, including data in GIS formats,may also be explored; and publishing metadata for consumption by external websites (asweb mapping services or RDF for example) and services is another possible area forfuture development.

5 Conclusions

The WGP has been specifically developed to enhance the availability, accessibility and(re-) usage of social science data amongst academics and policy makers in Wales. Thegeoportal was developed in response to perceived gaps in knowledge about the existenceand provenance of key data sets used to guide policy. The geoportal is unique in that, inour review of over 120 geoportals worldwide, we have not previously encountered ageoportal dedicated to supporting quantitative and qualitative social science academicand policy research. It also answers a wider call in the UK for the increased use ofsecondary and administrative data, often collected by government agencies and someprivate sector organizations as part of their day-to-day operational tasks, at a time oftight fiscal budgets. The WGP has been developed and is compliant with the mostup-to-date national and international metadata standards. The data infrastructure isdistinctive in that it has been built around metadata records (rather than source datarecords) meaning that it is has sophisticated search, discovery and data integrationfunctionality. It also means that it is easy to attach and integrate source data records topermit data (and metadata) sharing where possible. The WGP has been developedpurposively to include a wide range of disparate data sources so it will be attractive toresearchers from a variety of social science backgrounds and methodological approaches.It can also be used to reveal spatial, temporal and thematic ‘gaps’ in existing data acrossWales and identify ways of addressing these gaps through the integration and repurpos-ing of other data sources; for instance by the aggregation of individual level administra-tive data to bespoke areas.

However, as highlighted in this study, implementing a geoportal to integrate quan-titative and qualitative data is far from a trivial task and has involved using manydifferent types of open-source software programs and modules to address these over-arching aims (where proprietary software would have lacked flexibility), which should beof immediate relevance to other groups interested in developing such tools. Given thenature of the data currently in the meta-databases, the WGP should be of interest toGovernment agencies who are exploring methods for facilitating re-use and repurposingof their publicly funded data; data repositories and data archives which store both

122 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 19: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

quantitative and qualitative data and require a mechanism by which both can besearched, queried and displayed; and university research centres and research groups thathold research data locally but have no clear mechanism for data discovery and datasharing. The last example may become pertinent if European Universities, as publicbodies, are obliged to expose the spatial metadata of the research data they hold underthe auspices of the INSPIRE directive.

We are about to undertake a period of user testing of a beta version that will allowus to improve and develop the WGP and gain feedback on the data and functionality thatusers of social science data would like from the WGP. Initial feedback from ‘expert’ userswho have participated in demonstrations of the alpha version has been positive and theyhave been particularly impressed with the range of social science data on the WGP andthe ability of the WGP to bring it together coherently along thematic and spatialdimensions. We hope that the WGP represents a new kind of social science research tool;one that both allows access to all kinds of social science data relevant to academic andpolicy research and encourages collaboration between researchers from different socialscience fields and approaches.

Acknowledgements

This article is based on research supported by the Wales Institute of Social and EconomicResearch, Data and Methods (WISERD), funded by the Economic and Social ResearchCouncil (ESRC) (Grant Reference: RES-576-25-0021) and the Higher EducationFunding Council for Wales (HEFCW).

References

AGI 2009 UK Gemini Standard: A UK Metadata Standard for Discovery of Geographic DataResource. WWW document, http://www.gigateway.org.uk/metadata/pdf/UK_GEMINI_v1.pdf

Armstrong M P, Nyerges T L, Wang S, and Wright D 2011 Connecting geospatial information tosociety through Cyberinfrastructure. In Nyerges T L, Couclelis H, and McMaster R (eds) TheSAGE Handbook of GIS and Society. London, Sage Publications: 109–22

Beaumont P, Longley P A, and Maguire D J 2005 Geographic information portals: A UK perspec-tive. Computers, Environment and Urban Systems 29: 49–69

Berry R, Fry R, Higgs G, and Orford S 2010 A geoportal for enhancing collaborative socio-economic research in Wales using Open Source technology. Journal of Applied Research inHigher Education 2: 77–92

CIOC 2009 Open Source, Open Standards and Re-use: Government Action Plan. London,The Cabinet Office (available at http://www.cabinetoffice.gov.uk/media/123372/090224opensource.pdf)

De Longueville B 2010 Community-based geoportals: The next generation? Concepts and methodsfor the geospatial Web 2.0. Computers, Environment and Urban Systems 34: 299–308

Goodchild M F, Fu P, and Rich P M 2007 Geographic information sharing: The case ofthe Geospatial One-Stop portal. Annals of the Association of American Geographers 97:250–66

Giuliani G and Peduzzi P 2011 The PREVIEW global risk data platform: A geoportal to serve andshare global data on risk to natural hazards. Natural Hazards and Earth System Sciences 11:53–66

Haklay M and Zafiri A 2008 Usability engineering for GIS: Learning from a screenshot. Carto-graphic Journal 44: 87–97

The WISERD Geoportal 123

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)

Page 20: The WISERD Geoportal: A Tool for the Discovery, Analysis and Visualization of Socio-economic (Meta-) Data for Wales

Harrie L, Mustiere S, and Stigmar H 2011 Cartographic quality issues for view services ingeoportals. Cartographica 46: 92–100

Harris T M, Rouse J, and Bergeron S 2010 The Geospatial Semantic Web, Pareto GIS and thehumanities. In Bodenhamer D, Corrigan J, and Harris T M (eds) The Spatial Humanities: GISand the Future of Humanities Scholarship. Bloomington, IN, Indiana University Press: 124–42

Lance K T, Georgiadou Y P, and Bregt A K 2011 Evaluation of the Dutch subsurface geoportal:What lies beneath? Computers, Environment and Urban Systems 35: 150–58

Maguire D J and Longley P A 2005 The emergence of geoportals and their role in spatial datainfrastructures. Computers, Environment and Urban Systems 29: 3–14

Marsh M and Haklay M 2010 Evaluation and deployment. In Haklay M (ed) Interacting withGeospatial Technologies. Oxford, Wiley-Blackwell: 199–221

Masser I 2011 Emerging frameworks in the Information Age: The Spatial Data Infrastructure (SDI)phenomenon. In Nyerges T L, Couclelis H, and McMaster R (eds) The SAGE Handbook ofGIS and Society. London, Sage Publications: 271–86

National Data Strategy 2009 UK Strategy for Data Resources for Social and Economic Research,2009–2012. Swindon, UK, Polaris House

Nyerges T, Ramsey K, and Wilson M 2007 Design considerations for an Internet portal to supportpublic participation in transportation improvement decision making. In Balram S andDragicevic S (eds) Collaborative Geographic Information Systems. Hershey, PA, Idea Group:208–30

Obe R O and Hsu S L 2011 PostGIS in Action. Greenwich, CT, Manning PublicationsOpenCalais 2011 How Does Calais Work? WWW document, http://www.opencalais.com/aboutOGC 2004 Geospatial Portal Reference Architecture. Wayland, MA, Open Geospatial Consortium

Technical Report No. 04-039 (available at http://portal.opengeospatial.org/files/?artifact_id=6669)

Tait M G 2005 Implementing geoportals: Applications of distributed GIS. Computers, Environ-ment and Urban Systems 29: 33–47

Tang W and Selwood J 2005 Spatial Portals: Gateways to Geographic Information. Redlands, CA,Esri Press

Trinidad S B, Fullerton, S M, Ludman E J, Jarvik G P, Larson E B, and Burke W 2011 Researchpractice and participant preferences: The growing gulf. Science 331: 287–88

Yang P, Evans J, Cole, M, Marley S, Alameh N, and Bambacus M 2007 The emerging concepts andapplications of the spatial web portal. Photogrammetric Engineering and Remote Sensing 73:691–98

124 R Fry, R Berry, G Higgs, S Orford and S Jones

© 2012 Blackwell Publishing LtdTransactions in GIS, 2012, 16(2)