sara catarina santos silva definition and implementation of … · development and integration of...
TRANSCRIPT
Escola de Engenharia
Sara Catarina Santos Silva
Definition and Implementation o f a Metadata
Application Profile of the SusCity Project
Tese de Mestrado em
Engenharia e Gestão de Sistemas de Informação
Trabalho efetuado sob a orientação da
Professora Doutora Ana Alice Baptista
Março 2017
ii
Resumo
As cidades são o centro do desenvolvimento económico e social, onde uma percentagem
significativa da população mundial reside, tendo este número tendência a aumentar num futuro
próximo. Por esta razão, a eficiência de recursos é fundamental para a sustentabilidade das
cidades e, desta necessidade, nasceu o conceito de Cidade Inteligente.
O projeto SusCity baseia-se neste conceito; É um projeto que se enquadra no âmbito do
protocolo com o MIT e foca-se no desenvolvimento e integração de novas ferramentas e serviços
para aumentar a eficiência dos recursos urbanos.
O SusCity divide-se em seis pacotes de trabalho sendo que, esta dissertação se foca na
tarefa “Publicar dados abertos da cidade” compreendida no pacote de trabalho número dois a
cargo da Universidade do Minho. Consequentemente, o objetivo desta dissertação passa pela
definição e implementação de uma framework para publicar dados abertos no âmbito deste
projeto.
Para realizar o trabalho recorrer-se-á à metodologia Me4MAP. Esta metodologia tem como
objetivo o suporte para o desenvolvimento e implementação de um MAP e tem como ponto de
partida o RUP e a framework de Singapore estabelecida pela DCMI. Encontra-se divida em quatro
fases sendo elas, definição do âmbito, construção, desenvolvimento e validação e cinco atividades,
definição dos requisitos funcionais, modelo de domínio, description set profile e guias de utilização
e sintaxe.
Até ao momento, foi desenvolvido este documento referente ao estado de arte. Após a
entrega deste documento passar-se-á à implementação da metodologia Me4MAP tendo em
consideração os dados fornecidos pelos parceiros do projeto.
Palavras-chave: Web Semântica, Cidade Inteligente, Dados Abertos, Metadados, Interoperabilidade Semântica, Perfil de Aplicação.
iii
Abstract
Cities are at the centre of economic and social development and where a significant percentage of
the world’s population lives and this number is likely to increase in the near future. Resource
efficiency is, therefore, fundamental to the sustainability of cities, and from this need, the concept
of Smart City is born.
The SusCity project was based on this idea; An MIT project that focuses on the
development and integration of new tools and services to increase the efficiency of urban resources.
This project is divided into six work packages, and this dissertation focuses on the task
“Publish city open data” in work package two. Hence, this dissertation aims at a definition and
implementation of an application profile to publish Open Data under the scope of the SusCity
project.
In this dissertation, the Me4MAP methodology will be used. This method aims to support
the development and implementation of a Metadata Application Profile (MAP) and has as starting
points the Singapore Framework and RUP. It is divided into four phases, scope definition,
construction, development and validation and five activities, functional requirements, domain
model, description set profile, usage guidelines and syntax guidelines.
Until now, a literature review was carried out. From this point on, the Me4MAP
methodology will be implemented taking into account the data provided by the project partners.
Keywords: Semantic Web, Smart City, Open Data, Metadata, Semantic Interoperability, Metadata
Application Profile
iv
Table of Contents
Chapter 1
1 Introduction .................................................................................................................................... 1
1.1 Contextualization .................................................................................................................... 1
1.2 Motivation for the Study .......................................................................................................... 3
1.3 Relevance of the Study in the Field of Information Systems ..................................................... 3
1.4 Thesis Organization ................................................................................................................ 4
2 State of the Art ............................................................................................................................... 5
2.1 Smart Cities ........................................................................................................................... 5
2.1.1 Smart Cities Initiatives .................................................................................................... 6
2.2 Open Data ............................................................................................................................. 9
2.2.1 Open Data and Smart Cities .........................................................................................14
2.2.2 Open Data Initiatives ....................................................................................................17
2.3 Linked Open Data ................................................................................................................22
2.3.1 Metadata and Metadata Schemas ................................................................................26
2.3.2 Metadata Application Profile .........................................................................................28
2.3.3 From Open Data to Linked Open Data ..........................................................................31
2.3.4 LOD and Smart Cities ..................................................................................................34
2.3.5 LOD and Smart Cities Initiatives ...................................................................................35
3 Study to be Carried Out ................................................................................................................40
3.1 Conceptualization of the problem to be studied and its underlying objectives .........................40
3.2 Methodological Approach .....................................................................................................42
3.3 Activities Plan .......................................................................................................................46
4 Final Considerations .....................................................................................................................48
5 References ...................................................................................................................................49
v
List of Tables
Table 1 - Delivery dates for the dissertation ....................................................................................... 47
vi
List of Figures
Figure 1 - SmartSantander Logical architecture and building blocks ...................................................... 7
Figure 2 - Santander Dados Abiertos Web Site ..................................................................................... 8
Figure 3 - Amsterdam Smart City Web Site ......................................................................................... 9
Figure 4 - RDS Proposal .................................................................................................................. 13
Figure 5 - Motivation for Opening Data (EUROCITIES, 2013) .............................................................. 17
Figure 6 - Open Data Platform LifeCycle ........................................................................................... 18
Figure 7 - Amsterdam Open Data Site .............................................................................................. 20
Figure 8 - Amsterdam Open Data Dataset ......................................................................................... 21
Figure 9 - London Data Store Site .................................................................................................... 22
Figure 10 - RDF Graph with two nodes ............................................................................................. 24
Figure 11 - MAP Model applied with the Singapore Framework (Nilsson et al., 2008) ........................... 30
Figure 12 - Me4MAP V0.2 phases (Malta & Baptista, 2013) ............................................................... 31
Figure 13 - 5 Star Open Data Model ................................................................................................. 31
Figure 14 - Cost and benefits of 1-star web data (Bauer & Kaltenböck, 2012) ...................................... 32
Figure 15 - Cost and benefits of 2-stars web data (Bauer & Kaltenböck, 2012) .................................... 32
Figure 16 - Cost and benefits of 3-stars web data (Bauer & Kaltenböck, 2012) ................................... 33
Figure 17 - Cost and benefits of 4-stars web data (Bauer & Kaltenböck, 2012) .................................... 33
Figure 18 - Cost and benefits of 5-stars web data (Bauer & Kaltenböck, 2012) .................................... 34
Figure 19 - Software Components architecture of the IES Cities Platform ............................................. 37
Figure 20 - IES Cities from a citizen’s perspective .............................................................................. 37
Figure 21 - IES Cities user’s manager ............................................................................................... 38
Figure 22 - IES Cities from the municipally perspective ...................................................................... 38
Figure 23 - IES Cities from a developer perspective ........................................................................... 39
Figure 24 - IES Cities from a developer perspective .................................................................... 39
Figure 25 - Star City Application ...................................................................................................... 40
Figure 26 - Me4MAP phases (Malta & Baptista, 2013) ....................................................................... 43
Figure 27 - Me2MAP V0.2 life-cycle development model (Malta & Baptista, 2013) ................................ 44
Figure 28 - Example of a Domain Model (Coyle & Baker, 2009) .......................................................... 45
vii
List of Acronyms
AGLS Australian Government Locator Service AP Application Profile CC Creative Commons CR Candidate Recommendation DCAM Dublin Core Abstract Model DCAP Dublin Core Application Profile DCMES Dublin Core Metadata Element Set DCMI Dublin Core Metadata Initiative DCTERMS DSR
Dublin Core Metadata Initiative Metadata Terms Design Science Research
DSP Description Set Profile EU European Union FCT Fundação para a Ciência e a Tecnologia FOAF Friend of a Friend vocabulary GIS HTTP
Geographic Information System HyperText Transfer Protocol
ICT Information and Communication Technologies IEFT Internet Engineering Task Force IoT Internet of Things IRI Internationalized Resource Identifier ISO International Organization for Standardization LOD MAP
Linked Open Data Metadata Application Profile
NGO Non-governmental organisation OWL Web Ontology Language PR Proposed Recommendation PRISMA PlafoRms Interoperable Cloud for SMArt-Government RDF Resource Description Framework RDFS RDS
Resource Description Framework Schema Responsible Data Science
RUP Rational Unified Process URI Uniform Resource Identifier URL SKOS
Uniform Resource Locators Simple Knowledge Organisations Systems
W3C World Wide Web Consortium WD Working Draft WP Work Package WR W3C Recommendation XML eXtensible Markup Language
1
1 Introduction
1.1 Contextualization
Cities are at the heart of economic and social development, where a significant percentage of the
world’s population resides, and this number is likely to increase in a nearby future. Resource
efficiency is, therefore, fundamental to the sustainability of cities, and from this need, the concept
of Smart City is born. A Smart City is a city that invests in human and social capital, traditional
(transport) and modern (ICT) infrastructures to enable a sustainable economic development and
ensure a high quality of life with the intelligent management of natural resources (Caragliu, Del Bo,
& Nijkamp, 2011).
The SusCity project was born from this concept of Smart City; An MIT Portugal project,
funded by FCT, Ministry of Education and Science of Portugal, EDP, ADENE, R&D Nester, Novabase
and iTds and created in partnership with INESCPorto, R&D Nester, IST-ID, iTds, MIT, lnEG, Adene,
Coimbra University, EDP, Minho’s University, Novabase, IBM, Faculty of Sciences of the University
of Lisbon, idMEC and Lisbon City Council. SusCity is a project that focuses on the development
and integration of new tools and services to increase the efficiency of urban resources. The SusCity
is distributed into six Work Packages (WP) and this dissertation is developed under the scope of
WP2 “Information Services and Data Processing Platform” task six “Publish City Open Data”, and
the goal is to research and develop mechanisms for selecting and making the data available under
a Linked Open Data (LOD) perspective.
Organisations that provide public services are the ones that contain data about cities. For this
project, it is crucial that partners such as EDP and ADENE release their data to the public as Open
Data. The Open Knowledge International1 defines Open Data as data that is freely available for
anyone to use and republish as they wish, without any copyright restrictions, patents or any other
mechanism of control.
Tim Berners-Lee (2005), the father of the web and LOD, refers to the Semantic Web as a Web
of machine-readable information whose meaning is well-defined by standards. These standards are
used to facilitate an exchange of data and metadata by promoting consistency in the Web. A
Standard or a Recommendation is a specification or set of guidelines with a high level of maturity,
1 https://okfn.org/
2
and that has passed an extensive process of evaluation and is ready to be adopted by the
community.
These Standards and Recommendations are developed by rename organisations, in the field
of open data and LOD, like the DCMI and W3C and with the community that is going to adopt
them. In LOD it is essential to obey to these Standards because they aim Semantic Interoperability.
Semantic Interoperability is defined to be “the ability of computer systems to exchange data with
unambiguous, shared meaning. Semantic Interoperability is a requirement to enable computable
machine logic, inferencing, knowledge discovery, and data federation between information
systems”2.
Metadata and Metadata Schemas play a significant role in the development of the Semantic
Web. The Semantic Web links data together, and metadata can provide the connections as well
as the description of the content. Metadata is known to be anything that describes anything or
“data about data” (Baca, 2016) and, a metadata schema is a set of metadata elements defined
for a particular purpose in an individual context (Coyle & Baker, 2009). Although metadata is an
integral part of the Semantic Web, metadata on its own is far from sufficient. Again, we need
standards to encode and represent knowledge so tasks can be performed in an efficient and
comprehensible manner. A variety of enabling technologies have been developed over the last
few years that are critical for metadata encoding and manipulation such as XML and XML
Schema, RDF and RDFS and OWL (Greenberg, Sutton, & Campbell, 2003).
The DCMI defines an Application Profile as a set of metadata elements, policies, and
guidelines defined for a particular application. These elements can be from one or more elements
sets allowing us to use several element sets including ones created by us. With this, all the
functional requirements of an application can be covered. It is considered a best practice that
everyone that creates an Application Profile, should document the policies and best practices
appropriate to the application.
These are some of the key aspects that constitute the Semantic Web. In the following sections
of this document, these are going to be cover in more detail.
2 https://www.ncoic.org/home
3
1.2 Motivation for the Study This study aims to create an Application Profile for the SusCity data and to publish the data under
a LOD perspective. This data is about energy and energy efficiency in buildings. A research work
was carried out, prior to this dissertation, to investigate if Application Profiles, Properties and
Vocabularies regarding energy and energy efficiency in buildings existed. The conclusion of this
study was that such resources were almost non-existing. Hence, the opportunity of this work is to
create and publish them on the Web for everyone’s use. With this, the aim is to contribute to the
LOD community and to promote Semantic Interoperability.
1.3 Relevance of the Study in the Field of Information Systems
Information Systems have become the backbone for most organisations. In almost every sector –
education, finance, government, healthcare, manufacturing, and business large or small –
information systems play a prominent role (van der Aalst & Stahl, 2011).
“Information Systems are combinations of hardware, software, and telecommunications
networks that people build and use to collect, create, and distribute useful data, typically in
organisational settings” (Jessup & Valacich, 2008). In practice, the achievement of cooperation
between these different components depends on the effectiveness of the interoperation between
participating systems. The interoperation between participating systems is where it can be best
shown the relevance of the Semantic Web in the field of Information Systems. The Semantic Web
relies on the effectiveness of interoperation between systems; it aims interoperability at a global
level where data can flow without borders.
Most of the Web’s content is intended for humans to read, not for computer programs to
manipulate meaningfully. They do not have reliable ways to process the semantic values of things.
The Semantic Theory, in which the Semantic Web relies on, provides an account of “meaning” in
which the logical connection of terms establishes interoperability between systems (Shadbolt,
Berners-Lee, & Hall, 2006). This interoperability comes from the fact that, every system uses the
same term to define the same thing.
The scientific impact of this dissertation is the definition of an Application Profile (AP) and
possibly metadata schemas, controlled vocabularies and properties for open data concerning
energy and energy efficiency of buildings.
4
The technical contribution is gained through the publishing and implementation of the
developed Application Profile, Properties and Vocabularies in a LOD platform.
1.4 Thesis Organization
This thesis proposal is organised into four chapters. Chapter two introduces a reflection of concepts
such as Smart City, Open Data and Linked Open Data. For each concept, is presented some
initiatives related to them. Chapter three exposes the characterization of the study. In here the
project’s objectives, the methodological approach and the activities plan are portrayed. At the end
of this documents are the final considerations.
5
2 State of the Art
2.1 Smart Cities
Cities are the centre of economic and social development. Where a large percentage of the world’s
population lives, and this number only tends to increase in a nearby future. Due to vast and complex
congregations of people, resource efficiency is fundamental to the sustainability of the city, and
from this need arises the concept of “Smart City”. This concept is used all over the world with
different nomenclatures, context, and meanings resulting in its inconsistent usage (Chourabi et al.,
2012). Cities present a set of characteristics very different from one another's, mainly because of
its historical evolution, present aspects or future provisions. The development of the concept of a
“Smart City” rises from a complex association between technology, society, economy,
administration and politics. Because of this, the expansion of this concept is going to diverge from
city to city aiming their specific goals, visions, and policies (Ramalho, 2015).
According to Ramalho (2015), there is clearly three broadly visions from which a “Smart
City” can be viewed. He defends that the development of a “Smart City” should be approached
from a perspective of administration, or from a societal and urban planning appulse or even from
the development of information and communication technologies (ICT). Despite this, a “Smart City”
could mitigate the problems caused by the urban population growth and rapid urbanisation
(Chourabi et al., 2012). Most of these problems can be related to traffic jams, environment
pollution, and natural resource limits (Pan et al., 2013). The early identification of problems related
to population growth would also allow to monitor, analyse and plan the city to improve the
efficiency, equity and quality of life of its citizens in real time (Batty et al., 2012).
For much of the 20th century, the idea that a city could be smart was something that came
out of a science fiction movie. Currently, this is no longer an idea since the massive proliferation
of intelligent computable devices anticipated that a city could indeed become smart with the help
of ICT infrastructure. In this regard, Betty et al (2012) labels a Smart City as a city in which ICT
merges with traditional infrastructures. This ICT perspective of a Smart City is the one that is going
to be pursued in this dissertation because this is the one that relates the most with the work that
is going to be developed. From this technological point of view, smartness means that we want our
cities to be understandable, to have the capability of learning and to be self-aware so, the analysis
and mining of sensed data from dynamic cities are a necessary step towards making a city smart
(Pan et al., 2013).
6
The first step to publish a City Open Data is to collect its data. The favoured way is to
publish raw data. Tim Berners-Lee in a 2009 TED talk3 advocated the release of data as raw data
(“Raw data now!”). Raw data is unadulterated data that is collected from the source. In some
cases, the release of raw data may not be possible because of privacy issues. Nevertheless, this
problem can be overcome by anonymizing the data. The full potential of open data relies on LOD
that makes possible to discover relationships between different data. The most common formats
of LOD are RDFa, JSON-LD, Turtle and N-Triples and RDF/XML. These are the formats that are
going to be used to publish the SusCity data.
In the following section, some smart cities initiatives are presented to illustrate the full
potential of this concept.
2.1.1 Smart Cities Initiatives
The use of programs to support the development of smart cities with the use of ICT aim at
promoting economic competitiveness, better quality of life to its citizens and to promote
environmental sustainability (Nam & Pardo, 2011).
In 2012, there were approximately 143 ongoing or complete smart cities projects around the
world. Among these initiatives, about 35 projects in North America and 47 projects in Europe were
leading efforts to implement smart technologies to address and resolve some more immediate
urban problems (Alemu, Stevens, & Ross, 2012).
In the next section, some smart cities initiatives are presented.
2.1.1.1 SmartSantander
The SmartStantander project is one of the projects of the Future Internet Research and
Experimentation initiative of the European Commission and represents a unique, in a world city-
scale, experimental research facility (Sanchez et al., 2014). This project envisions the deployment
of 20,000 sensors in Belgrade, Guilford, Lübeck, and Santander. SmartStantander’s project aims
to produce the following key target outcomes: 1) An architectural reference model for open real-
world Internet of Things experimentation facilities; 2) A scalable, heterogeneous and trustable large-
3 http://www.ted.com/talks/tim_berners_lee_on_the_next_web/
7
scale real-world experimental facility; 3) A representative set of implemented use cases for the
experimental facility; and 4) A large of set of Future Internet experiments and results (Sanchez et
al., 2014).
Figure 1 - SmartSantander Logical architecture and building blocks
The city of Santander has an Open Data platform named Santander Datos Abiertos4 where
they publish the city’s data. The purpose of this platform is to increase transparency between the
city of Santander and its citizens, to promote the reuse of public information and to be a source of
innovation.
In Santander Open Data Platform there are about 88 datasets available about traffic,
transportation, science, technology, environment, culture and infrastructures. The municipality of
Santander advocates for the data to be published in different formats such as XML, HTML, JSON,
N3, RDF, CSV, TURTLE, JSON-LD, ATOM, RSS, SHP, WKT, KMZ and XLS. At the time of the search,
December 2016, 82 datasets in XML were found, 83 in HTML, 82 in JSON, 82 in N3, 83 in RDF,
84 in CSV, 82 in TURTLE, 82 in JSON-LD, 82 ATOM and 15 in RSS. These numbers show that the
website cares deeply about LOD since the majority of their datasets are in LOD formats.
4 http://datos.santander.es/
8
Figure 2 - Santander Dados Abiertos Web Site
2.1.1.2 Amsterdam SmartCity
Amsterdam Smart City5 is a platform of the Amsterdam Metropolitan Area that appeals the
contribution of business, residents, the municipality and knowledge institutions to solve urban
issues.
This initiative has a project regarding Open Data that involved a development of an Open
Data Platform. data.amsterdam.nl aims at strengthening the economy of the Amsterdam
metropolitan area by opening public data sources to citizens and businesses. With this, citizens,
businesses, research institutions and other parties, are enabled to develop services that wouldn’t
be possible before.
This project was initiated by the Amsterdam Economic Board, Waag Society, Vrije University,
Universiteit van Amsterdam and 2CoolMonkeys. The project was partly financed by the European
Regional Development Fund of the European Union. Since 2015, Gemeente Amsterdam (OIS) is
managing data.amsterdam.nl.
This platform focuses on six different themes such as Circular City, Citizens & Living, Energy
Water & Waste, Governance & Education, Infrastructure & Technology and Mobility.
On the website, at the date of the search, December 2016, is possible to see that there are
about 306 datasets from which 35 were added in the last 12 months from 41 different publishers.
The data is available in the following formats, XLS, HTML, JSON, CSV, XML, PDF and, WMS.
5 https://amsterdamsmartcity.com
9
Figure 3 - Amsterdam Smart City Web Site
2.1.1.3 Smart London
London is the biggest city in Europe, but size comes with a lot of urban problems as well, such as
transport, energy, and healthcare related. The Smart London approach is about improving the lives
of Londoners with the use of, but not limited to Open Data & transparency.
Regarding Open Data & Transparency, the London DataStore6 was created to make public
data open and accessible to citizens. From this data, numerous apps were developed by the
community that helped the city of London to function better.
In London DataStore the topics of the available data are very broad and go from health,
transports, business, sports, culture and the environment. There are about 75 publishers that
release open data to citizens.
2.2 Open Data
“Information is power. But, like all power, there are those who want to keep it for themselves.”
Aaron Swartz
Human rights are described as the fundamental rights that people acquire, just by the fact of being
human. In the 1948 United Nations Universal Declaration of Human Rights7 it is settled that these
rights include cultural, economic, and political rights, such as the right to life, liberty, education,
6 https://data.london.gov.uk/ 7http://www.un.org/en/universal-declaration-human-rights/
10
and equality before the law, the right of association, belief, free speech, religion, movements, and
information.
Amid Information rights is the right to create and communicate information (e.g., freedom
of expression, freedom of associations), to control others’ access to information (e.g., privacy and
intellectual property), and rights to access information (e.g., free of thought, the right to read)
(Mathiesen, 2008). Open data targets the last one, the right to access information.
John Wilbanks, Vice President of Science at Creative Commons once said “Numerous
scientists have pointed out the irony that right at the historical moment when we have the
technologies to permit worldwide availability and distributed process of scientific data, broadening
collaboration and accelerating the pace and depth of discovery ... we are busy locking up that data
and preventing the use of correspondingly advanced technologies on knowledge.” The digital world
is the only place where something can be shared and those who share it do not lose anything, they
still continue with the exact same thing that they had before doing it. Imagine the follow, in the real
world if I had 5 candies and if I shared 3 candies with my friends I would only have 2 candies left.
In the digital world, I could share as many candies as I would like to and I would have the exact
amount of candies that I had before sharing. Why not take advantage of this, especially regarding
scientific data, where there is so much underlying knowledge waiting to be discovered. At this
moment we have the technology, the only thing missing is people's compliance. Currently, one of
the leading organisations working with this movement of opening scientific data is the Open
Knowledge International.
The Open Knowledge International is a community-based organisation that promotes open
knowledge, which encompasses Open Data, free culture, the public domain, and other areas of
the knowledge commons. Founded in 2004, the Open Knowledge International has grown into an
international network of communities that develop tools, applications, and guidelines enabling the
opening up of data, and subsequently the discovery and use of that data. Its working groups are in
fields as broad as government, science, transport, education, open access, economics, personal
data and privacy.
Their call is that “Open knowledge can empower everyone, enabling people to work
together to tackle local and global challenges, understand our world, expose inefficiency and
challenge inequality and hold governments and companies to account”8. In today’s world
information is power and is a key to knowledge and everything around us. With this, they want to
8 https://okfn.org/about/ (Access 22/02/2017)
11
promote everyone’s access to information and the ability to use it. They envision “a world where
knowledge creates power to the many and not a few”9.
The Open Knowledge International acknowledges “Open Data as data that can be freely
used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute
and share alike”. They defend that everyone should be able to make informed decisions about
many subjects, such as the place we live, what we buy or who gets our vote. The ultimate intent is
to increase transparency between citizens and data holders with the use of Open Data.
One crucial aspect for Open Data is the matter that everyone can use it as it wishes, even
to create commercial value. To guarantee this premise is imperative to license the data to assure
data users that they will not suffer any repercussions from using the data. The Open Data
Commons10, an organisation that develops legal tools for Open Data and is part of the Open
Knowledge Foundation Network, presents three different types of licences for Open Data. 1) Public
Domain Dedication and Licence (PDDL) this permit places the data(base) in the public domain that
is, the author ultimately gives up his rights respecting the data(base). 2) Attribution Licence (ODC-
By) with this licence the user is free to share, create and adapt as he wishes as long as he attributes
any public use of the database, or works produced from the database, in the specified manner in
the licence. 3) Open Database Licence (ODC-ODbL), the data user, is also free to share, create
and adapt as long as he attribute(s) any public use of the database, or works produced from the
database, in the manner specified in the licence. Also, if the user publicly uses any adapted version
of the database, or work produced from an adapted database, he must also release that adapted
database under the same licence, and finally he must keep the data(base) open.
Like the Open Data Commons, the Creative commons (CC)11 also provides licences to Open
Data. Creative Commons (CC) licences are the most used and recognised standard licences for
providing access to data and other resources. These licences permit the free of charge copying,
reuse, distribution and, in some cases, the modification of the original creator’s creative work,
without having to obtain permission every single time from the rights holder. Similarly to Open Data
Commons these licences also have various derivations regarding the different type of uses that one
can make.
Every day, extraordinary amounts of digital information are created about nearly every
aspect of our lives. Many believe that locked within all that data is the key to knowledge about how
9 https://okfn.org/ (Access 22/02/2017) 10 https://opendatacommons.org/ 11 https://creativecommons.org/
12
to cure diseases, create business value, and govern our world more effectively. The idea with Open
Data is to make data available to the public that concerns citizens and enables everyone to use it
without any charge. With this opening, we locate ourselves in a state of collective intelligence where
everyone can see what we are doing and can even contribute with feedback. The fundamental idea
is that, under the right circumstances, groups can generate better alternatives and make better
decisions than even the smartest people can do on their own.
Open Data should be made available in open, non-proprietary and easily processable,
machine readable formats that enable the reuse and redistribution of data. An open and non-
proprietary format is one where the specifications for the software are available to anyone, free of
charge so that anyone can use the specifications in their software without any limitations on re-use
imposed by intellectual property rights. Besides this, there is a set of principle that must be taken
into account when releasing or using data. The FORCE1112 is a community of scholars, librarians,
archivists, publishers and research funders that work together to facilitate the change toward
knowledge creation and sharing. This community released the FAIR data principles13 that are a set
of guiding principles that aims to make the data Findable, Accessible, Interoperable, and Re-usable
to do this, semantic web technologies such as RDF and IRIs are used. Another initiative to use and
share data in a responsible way was conducted by leading Dutch research groups from multiple
disciplines that joined forces and created the Responsible Data Science14 (RDS) consortium, where
they state that the data should be confidential and accurate. The results derived from the data
should be fair and transparent. (See figure 4)
12 https://www.force11.org/ 13 https://www.force11.org/group/fairgroup/fairprinciples 14 http://www.responsibledatascience.org/
13
Figure 4 - RDS Proposal
While the opening of data is expected to create benefits such as 1) political and social, 2)
economic, and 3) operational and technical benefits (Janssen, Charalabidis, & Zuiderwijk, 2012).
It is also projected to produce benefits like stimulating innovation and therefore promoting
economic growth. Increase the transparency between the publisher and the consumer of data
because the user itself can verify whether the conclusions drawn from the data are correct and
justified. It can also strengthen accountability, build trust, and improve citizen satisfaction. These
benefits increase when the data is about what governments are doing, and this is increasingly
recognised as an essential precondition to the meaningful exercise of democratic accountability
and deliberation (Janssen et al., 2012). To account democratic responsibilities and to improve the
citizen's trust is why government data is one of the more requested to be open.
While the opening of data can potentially provide numerous benefits, it can also entail
some barriers. The nature of these barriers can be at an institutional level, or about the task
complexity of handling the data, the use of open data and participation in the open data process,
legislation, information quality, and at the technical level (Janssen et al., 2012). The most
common barriers are related to either data providers, that do not wish to publicise data, or data
users, that don’t have the ability to use the data a straightforward manner. With this it rises
another problem, such as Gurstein M (2011) said, even though the notion of Open Data alleges
that everyone can benefit from it, this can actual not be the reality. In Gurstein opinion, only a
few number of “lucky” people with the resources and technical expertise can fully benefit from
it. He mentions that the majority of people would fall short in this concept because of some
factors such as Internet access, or because people do not understand the language in which the
14
data is presented, or lack the technical and professional requirements for interpreting and making
use of the data.
On this subject, one could agree with the fact that Gurstein M. said that because of some
significant events such as internet access or even lack of technical expertise, that Open Data may
not reach every citizen of the world. Nevertheless, this should not dismiss the social impact that
open data can bring to society. The fact is that many movements for example such as the creation
of the World Wide Web would also fall short with reaching every people in the world when it first
appeared. Only with its development and with people compliance it was possible for the Web to
have an enormous influence in the world and people’s lives and, the same thing may be expected
with Open Data. Tim Berners-Lee, the inventor of the Web, perfectly describes this idea, he said
“Tough I was privileged to lead the effort that gave rise to the Web in the mid-1990s, it has long
passed the point of being something designed by a single person or even a single organisation. It
has become a public resource for many individuals, communities, companies, and governments
depend on. And from its beginning, it is a medium that has been created and sustained by the
cooperative efforts of people all over the world”. The conclusion to this is that yes, maybe only a
few number of "lucky" people can fully work with the concept of open data. However, for a greater
good of the society, these people need to work together in a state of collective intelligence and
collaboration for the world to benefit from it. The data itself may present some barriers, but
everything in the world does, and that shouldn't dismiss the impact that open data can have in
every citizen of the world. While only this lucky people can create something with open data,
everyone can use it and benefit from it.
2.2.1 Open Data and Smart Cities
From the previous section, we known that Open Data is data that anyone can access, use and
share. It can help us all to better understand and interact with our cities, whether it is information
on local housing, real-time train time from rail companies, or finding supermarket locations from
retailers.
The Open Data Institute15, one of the leading initiatives regarding Open Data, claims that a
smart city is an open city and keeping infrastructure and markets open is the only scalable way to
ensure equitable and secure growth in our cities. This opening means that we need data to take
15 http://theodi.org/
15
full advantage of what our city offers and to improve its citizen’s quality of life while fostering
innovation and therefore promoting economic growth.
Bettine Tratz-Ryan, research vice president at Gartner, said that for citizens “Developing
‘smartness’ in their eyes means developing contextual applications for them”. A contextual
application is an application that will augment the ability to perceive and act at the moment based
on where we are, who we are with and our past experiences. A perfect example of a contextual
application is Google Now. Google Now is an intelligent personal assistant developed by Google
that answers user’s questions, makes recommendations, performs actions, and delivers user’s
information that it predicts (based on their search habits).
In recent years, due to citizens increasingly demand access to meaningful data, cities are
responding by building platforms that improve the municipality service and urban quality of life with
the promotion of Open Data. These data platforms should be as accessible as possible and meet
the needs of coders, who have the technical expertise and can use the platform to retrieve data;
data owners, who want the identity of the organisation to be reflective in the portal; and the ‘general’
audience, who can look for information about data and develop applications. With this data, citizens
can take the initiative, “do it for themselves”, innovate and co-create. Indeed there is a concept
named “citizen science” that is known to be a partnership between volunteers and scientists to
answser real-world questions. This concept emerged in 1995 and (Irwin, 1995) reclaimed two
dimensions of the relationship between science and citizens: 1) that science should be responsive
to citizens’ concerns and needs; and 2) that citizens themselves can produce reliable scientific
knowledge. Actually, collaborations between scientists and volunteers have the potential to broaden
the scope of research and enhance the ability to collect scientific data (Cohn, 2008). This means
that citizens themselves can help to produce open data or create something through it. About this
ability of citizens to collect data, there is another concept known as “Humans as sensors” where
individuals can make observations about the physical world around them (Wang et al., 2014). With
this in mind, sites such as Wikimapia16 and OpenStreetMap17 are empowering citizens to create a
global patchwork of geographic information (Goodchild, 2007), to create a world map done by
citizens. This increase in civic participation promotes a wave of Open Data Innovation which can
lead the city one step closer to become a "Smart City".
16 http://wikimapia.org 17 www.openstreetmap.org
16
Before publishing the data on to this Open Data Platforms some key steps must be taken to make
full use of it. The EuroCities Handbook suggest that the first step that must be taken by cities who
want to release their data, is to decide what data is going to be open. It is a good idea that cities
focus on data about issues of high local relevance that have the greatest potential to raise interest
amoung its citizens. Then, they need political support, about open data policies and the
engagement of all the city administration organisation in developing approaches to address
disclosure and access to data. Working with stakeholders is also of high relevance because, in
most cases, the developer of services will not be the public organisation itself, but companies,
interest groups, non-governmental organisations (NGOs), individual citizens, and students. The city
should support stakeholders to promote the data and bring the right people together through, for
example, local hackathons and data demonstration days. The city should also promote data literacy
amongst citizens to overcome scepticism related to this being a relatively new concept. The city
must also overcome privacy issues of the data by aggregating the data to the right level and seek
legal support from open data initiatives. Finally, the last step is all about data quality. There can be
issues concerning the quality of data but, rather than seeing this as a barrier, the public’s
organisations can use it as an opportunity to receive feedback and improve its quality.
To reveal the benefits and motivation of a city to open its data the working group of
EUROCITIES in collaboration with the EU Open Cities project, conducted a survey, in 2011, where
thirteen cities responded. This study found that the primary motivation for open data is the goal of
achieving transparency in governmental and administrative processes (100%). Likewise, of great
importance for cities is the subject of ‘innovation’ (92%). The reuse of the datasets (76%), efficiency
gains (76%) as the potential of creating economic value (69%) was also motivation for open data in
cities. Only about half of the cities (53%) viewed ‘citizen participation’ and ‘crowdsourcing ideas’
as a stimulus for open data. (See figure 5).
17
Figure 5 - Motivation for Opening Data (EUROCITIES, 2013)
While these Open Data Platforms may have many benefits, it also encounters barriers to its use
and adoption. The top barrier of these platforms is its perceived poor quality of data available on
the platforms. This poor quality of Open data includes poor metadata, failure to use the right format
for different audiences, and difficulty in locating data of interest.
2.2.2 Open Data Initiatives
With transparency at aim, cities are adopting the concept of Open Data to increase the trust of its
citizens. In the following section, some initiatives regarding Open Data and smart cities are
presented.
2.2.2.1 Open Cities – Open Data Platform18 19
Open Cities is a project co-founded by the European Union that enables cities throughout Europe to publish their data as Open Data. This project promotes the creation of a civic web and mobile applications, having in mind the improvement of quality in services, lower costs and improved transparency between the city and its citizens.
Developed by Fraunhofer, the Open Cities Open Data Platform is an open source platform that can be easily customised to match a public or private organisation’s requirements.
18 https://github.com/fraunhoferfokus/opendata-platform 19 http://open-data.fokus.fraunhofer.de/en/platform/
18
These platforms support the entire Open Data lifecycle process, which includes identifying, publishing, discovering, enriching, and consuming data.
The standard configuration of the platform uses CKAN (Comprehensive Knowledge Archive Network) backend for storing the metadata. CKAN provides numerous features including customizable metadata schemas and Liferay based frontend it also has a repository for linked data based on Virtuoso20.
Unlike other platforms (e.g, Socrata21), this platform is free and it is available as open source software under the AGPL22 (Affero General Public License) terms. This license states the Open Data Platform as free software, that can be redistributed and modified.
Figure 6 - Open Data Platform LifeCycle
2.2.2.2 Open Data Europe Portal
The European Union Open Data Portal23 was launched in December of 2012 and is a single point
of access to a wide range of data held by EU institutions, agencies and other bodies. This portal
allows to search, explore, link, download and reuse the available data to commercial and non-
commercial purposes, through a catalogue of common metadata. Because of this catalogue, users
can access data stored in EU institutions, agencies and other body’s websites. This portal also
uses semantic technologies to allow to search the metadata catalogue via an interactive search
engine (Data tab) and through SPARQL queries (Linked data tab).
20 https://virtuoso.openlinksw.com/ 21 https://socrata.com/ 22 https://www.gnu.org/licenses/agpl-3.0.html 23 open-data.europa.eu
19
Ordinary users can engage with this portal by suggesting which data they would like to see
available, give feedback on the quality of the data obtained and share information with other users
about how they used the data.
The portal is available in 24 EU official languages but, most metadata is available in a
limited number of languages (English, French and German).
The reuse policy of this portal is implemented by the Decision of 12 of December 2011 –
Reuse of Commission documents24. In this Decision, reuse is defined as “the use of documents by
persons or legal entities of documents, for commercial or non-commercial purposes other than the
initial purpose for which the documents were produced”. Conditions for the reuse of documents
can be applied. Those conditions, which shall not restrict the reuse, include the following: (i) the
obligation for the reuser to acknowledge the source of the documents; (ii) the obligation not to
distort the original meaning or message of the documents; (iii) the non-liability of the Commission
for any consequence stemming from the reuse. Any other conditions can be imposed with an open
licence or a disclaimer setting out the conditions.
When the site was visited on the 4th January of 2017, 64 publishers were EU institutions,
bodies or departments (e.g., Directorate – General for Communication, European Parliament,
Eurostat, publications officer, European Central Bank, and others). There was a total of 9255
datasets available to use.
The data is released in many formats, and each dataset can be released in multiple formats.
The most commonly use form to release data is ZIP (32%). ZIP is a property format, which means
that it needs a particular software to be able to use its data. The use of proprietary formats goes
against of the commonly accepted formats for open data which requires them to be machine
readable and non-proprietary. The second, third and fourth, most used formats, HTML, text, XML
respectively correspond to 60% of the formats in which the datasets are released. These formats
are machine-readable and non-proprietary.
24 http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2011:330:0039:0042:EN:PDF
20
2.2.2.3 Amsterdam Open Data25
The city of Amsterdam has an Open Data portal that currently uses DataPress. DataPress is an
open source company that creates websites for hosting Open Data in the cloud. It uses WordPress
and CKAN, an open-source data portal platform. The Amsterdam Open Data portal aims at
strengthening the economy of Amsterdam by unlocking available (public) data sources to citizens
and businesses. With this data citizens and businesses can develop services that would not be
possible before. With the Amsterdam Open Data portal they wish to get everyone involved in data
publishing, so it was created an interface that made it easy to publish and view data.
Some of the main benefits appointed by the project owner, Jasper Soetendal26 are: (i)
Transparency, in his view, in a well-functioning democratic society citizens need to know what their
government is doing. They must be able to freely access government data; (ii) Releasing social and
commercial value, by opening up data, this portal can help drive the creation of innovative business
and services that can deliver social and commercial value; (iii) Participatory Governance, with open
data citizens, are able to contribute in the process of governance; and (iv) Efficiency, Opening up
data to the public, may lead to internal efficiency as well. Since everybody can access the data, it
means that the data is available internally as well.
Figure 7 - Amsterdam Open Data Site
25 data.amsterdam.nl 26 https://amsterdamsmartcity.com/projects/dataamsterdamnl
21
The site was visited on the 3rd of January of 2017, the first thing that could be noticed was the
incompressible language (Dutch) in what the site was presented. When looking for a language
setting, it could not be found.
The same thing happened with the language in what the data was presented. To confirm
this, a dataset called “Activiteiten” was downloaded and opened (See fig. 6)
Figure 8 - Amsterdam Open Data Dataset
As it can be seeing through the image all attributes, except for their names and an attribute called
TitleEN, are in Dutch. Language is one of the most common barriers in Open Data. If the data is
in some language that only a significantly low number of people understand, it may never reach its
full potential. They can also argue that this data is for the use of the City’s citizens as it only
concerns them.
The data that this portal offers is in CSV, DIV, JSON, KML, PDF, txt; WFS, WMS, XLS and XML.
2.2.2.4 London Data Store27
Over 600 datasets detailing life in London have been put up online as Open Data enabling the free
reuse by anyone who wishes it. The topics of the datasets are very broad and include information
about, for example, planning decisions, crime rates, abandoned vehicles, house prices, road
accidents and many others. The data available in this data store is published in XLS, CSV, PDF,
XML, SHP, TSV, IMG and some other formats.
27 https://data.london.gov.uk/
22
Figure 9 - London Data Store Site
At the time the site was visited, on the 3rd of January of 2017, there were 689 datasets that seem
to be regularly updated. As we can see from the image, the latest updated was made 3 hours prior
to this visit in a 2014 dataset about Greenhouse gas inventory. This regular update in datasets
goes accordingly to the Open Data Principle of the Open Government Data28 that data should be
timely, that is, the value of the data must be preserved and that it should be made available
(updated) as quickly as necessary. Despite this, the formats in which the data is released do not
meet the requirements of interoperability and non-proprietary. As we can see through the image
506 datasets, which is more than half of the available datasets, were released in an XLS format.
This format is not a non-proprietary since it can only be open using Microsoft Excel. Of the 689
datasets, 313 that correspond to 45,42% of the datasets, were released in a non-proprietary format.
2.3 Linked Open Data
The World Wide Web has radically changed our lives and the way we share knowledge by lowering
the barrier of publishing and accessing documents as part of a global information space (Bizer,
Heath, & Berners-Lee, 2011). The notion of a Sematic Web was presented by Berners-Lee et al.,
2001. They stated that the Semantic Web is not a separate Web but an extension of the current
28 https://opengovdata.org/
23
one, in which information is given well-defined meaning, better-enabling computers and people to
work in cooperation. In recent years, we have been seeing a revolution of the Web from a “Web of
documents” into a “Web of data”. This change, as well as the availability of large collections of
sensor generated data (“Internet of Things”), and others types of data is leading to a new
generation of Web applications (Ferraram, Nikolov, & Scharffe, 2013). Because of this, the linking
task between different data is more crucial than ever. From this need, the concept of Linked Open
Data is presented as using the Web to create typed links between data from different sources.
Linked Open Data refers to data published on the Web in such a way it is machine-readable, its
meaning is explicitly defined, it is linked to other external data sets, and can, in turn, be associated
with/from external datasets (Bizer et al., 2011). The linking of data is the main point of the
Semantic Web because with this we can derive information and meaning from data.
Most of the Web’s content is designed for humans to read, not for computer programs to
manipulate meaningfully. Computers do not have reliable ways to process the semantic value of
things. The Semantic theory, in which the Semantic Web relies on, provides an account of
“meaning” in which the logical connection of terms establishes interoperability between systems
(Shadbolt et al., 2006). This interoperability comes from the fact that, every system uses the same
term to define the same thing and, the meaning of each term can be retrieved by the relation of
that term with other terms.
Until now, we have seen that for the Semantic Web to function we have to be able to link
different sources of data to gather its semantic value. However, it is also needed for computers to
have access to structured collections of information and sets of inference rules (Berners-Lee et al.,
2001). Computers can use this to automate reasoning and to create new sets of rules from the
data that is available to them.
Linked Open Data relies on two technologies that are fundamental to the Web (Bizer et al.,
2011): Resource Identifiers (IRI, URL) and the HyperText Transfer Protocol (HTTP).
Internationalized Resource Identifier (IRIs) are an essential part of the Semantic Web. They ensure
that concepts are not just words but are tied to a unique definition that everyone can find on the
Web (Berners-Lee et al., 2001). When an IRI is associated with a resource, anyone that wishes can
link to it, refer to it, or retrieve a representation of it. Uniform Resource Locators (URLs) in addition
to identifying a resource, like an IRI, provide a way of locating the resource by describing its primary
access mechanism (access protocol). The most common used access protocol is HTTP that
provides a universal mechanism for retrieving resources that can be serialised as stream or bytes
24
or retrieving descriptions of entities that cannot themselves be sent across the network in this way
(physical object) (Bizer et al., 2011).
Another important technology for developing the Semantic Web is Resource Description
Framework (RDF). In the beginning, when RDF first appeared it relied in eXtensible Markup
Language (XML). XML is used to describe a class of data objects and partially explains the
behaviour of computer programs which process them (Bray, Paoli, Sperberg-McQueen, Maler, &
Yergeau, 2008). In short, XML allows users to add arbitrary structure to their documents but says
nothing about the meaning of its structure (Berners-Lee et al., 2001). What is missing in XML is
the relation between the data and its semantic value. A machine cannot understand what the data
is about with XML. For this RDF can be used. RDF is a framework for expressing information about
resources. This resources can be anything, documents, physical objects, places, people and
abstract concepts. The primary use of RDF is for machines to process information. With RDF
information can be exchanged between applications without loss of meaning (Manola et al., 2014).
Its core structure of syntax is a set of triples, and these triples allow the formation of webs of
information about related things. Each triple is formed by a subject, a predicate, and an object (see
fig. 10) (Cyganiak et al., 2014). RDF formulates statements such as humans form sentences, this
is why RDF is so intuitive. To a set of triples, we call an RDF graph.
Another way to represent RDF is with the use of textual syntaxes that are used to do textual
representations of RDF graphs. RDF/XML is one example but has become outdated because it was
not as human-friendly as others, such as Turtle.
Figure 10 - RDF Graph with two nodes
An RDF triple consists of three components: (i) the subject, that can be an IRI or a blank
node; (ii) the predicate, which is an IRI and, (iii) the object that is an IRI, a literal or a blank node.
A component of an RDF triple can be described with (i) IRIs and (ii) Blank node is a node
representing some resource for which an IRI or literal is not given. A blank node can only be used
25
in the subject or object part of an RDF triple (Chen, Zhang, Chen, & Guo, 2012), and (iii) Literals.
Literals are used for values such as strings, numbers, and dates (Cyganiak et al., 2014).
In 2001, when Tim Berners-Lee et. al, first introduced the Semantic Web, they also
introduced the concept of collections of information that they called Ontology. They adapted the
idea from philosophy where is used as a theory about the nature of existence. In computer science,
an ontology can be described as “… means to formally model the structure of a system, i.e., the
relevant entities and relations that emerge from its observation. Moreover, which are useful for our
purposes” (Staab & Studer, 2009). One of the main tasks of the Semantic Web research is to
encode human knowledge into ontologies. This encoding is a particularly challenging task, and one
of the most challenging problems is the adaptation of these ontologies to new insights (Klein,
2004). To represent this knowledge and the relation between deferent entities, we present three
languages: Resource Description Framework Schema (RDFS), Web Ontology Language (OWL) and
Simple Knowledge Organisations Systems (SKOS).
RDF describes resources with classes, properties, and values. Additionally, RDF also needs
a way to define classes and properties. These classes and properties must be defined using
extensions of RDF, such as RDFS that provides a data modelling vocabulary for RDF data. RDFS
took the basic RDF specification and extended it to support the expression of structured
vocabularies (Brickley & Guha, 2014). RDFS is sufficiently expressive to describe a class hierarchy,
contrary to RDF that only provides the set of triples with no relation between them (Ossenbruggen,
Hardman, & Rutledge, 2006). One may consider that the expressivity of RDFS is roughly limited to
a subclass hierarchy and a property hierarchy, with domain and range definitions of these
properties and not enough to create Ontologies on the Web (Antoniou & Harmelen, 2004). OWL
was created from this identified need for a more powerful ontology modelling language. The OWL
was built upon RDF and RDFS and have the same kind of syntax as these two (Antoniou &
Harmelen, 2004) and it is used to do knowledge representation. On the other hand, one could use
SKOS that is an RDF vocabulary for making thesauri, controlled vocabularies and subject headings
available on the web (Miles & Bechhofer, 2009). Unlike OWL, SKOS is not a language for knowledge
representation but rather a language for knowledge organisation. One could see the use of SKOS
as a useful alternative to OWL when there is no need for a rich expressive logic (Jupp, 2010).
Linked Open Data would not be useful if we do not have a way to retrieve information from
it. A standard way to do this is to use the SPARQL language. In January 2008, SPARQL became a
W3C Recommendation.
26
W3C defined the SPARQL language as a set of specifications that provide languages and
protocols with a way to query and manipulate RDF graph content on the Web or in an RDF store.
The SPARQL queries are composed by three parts (Pérez, Arenas, & Gutierrez, 2009). 1) The
pattern matching part, consists of matching patterns in RDF graphs; 2) Solution modifiers, once
we have the result from the pattern matching part, we can modify those values applying classical
operators; and 3) output of SPARQL queries. This output can be of three types: yes/no queries,
selection of values of the variable which matches the patterns defined, or the construction of a new
RDF graph.
2.3.1 Metadata and Metadata Schemas
In last decade, we have seen a growth of digital repositories, due to new technologies, and
particularly because of the World Wide Web. Because of this, there has been a tremendous increase
in the need for data management, an intense interest in metadata in a wide range of communities,
and extensive development of metadata schemas (Greenberg, 2005). The definition of metadata
is best known as “data about data” this means that metadata is anything that describes anything
else (Baca, 2016). With metadata, we can describe everything that we want, but representing
different types of resources requires several types of metadata and metadata standards (Rühle,
Baker, & Johnston, 2011). Metadata should follow clear rules in metadata schemas (or schemes).
A metadata schema or Vocabulary schema specifies the names of elements and their
semantic value. These metadata elements have a very specific meaning inside de schema. The
definition of each element is known as the semantics of the schema (Niso, 2004). Besides this,
each element of the schema has defined rules for the values that are associated with the element
(Malta, 2014).
Currently, there are hundreds of vocabularies schemas being used, here are presented five
examples:
Dublin Core Metadata Element Set (DCMES)29 : This is a vocabulary that was created
by the Dublin Core Metadata Initiative (DCMI), a pioneer and one of the most influential
initiatives regarding metadata. The DCMES is a vocabulary composed of fifteen
properties for use in resource description. The nature of its elements is broad and
generic and can be used to describe a broad range of resources. More recently, was
29 http://dublincore.org/documents/dces/
27
created the DCMI Metadata Terms (DCTERMS) where were added properties to the
fifteen already created.
Schema.org30 : In 2011, the major search engines such as Bing, Google, and Yahoo
(later joined by Yandex) came together with the goal of providing a single schema
across a wide range of topics. This idea was to present webmasters (a person
responsible for maintaining one or more websites) with a single vocabulary that they
could use to represent a variety of things. The vocabularies are developed by an open
community process. Schema.org is a vocabulary that can be used with different
encodings, including RDFa, Microdata and JSON-LD. This vocabulary covers entities,
relationships between entities and actions.
DBpedia31: DBpedia is a crowd-sourced community effort to extract structured
information from Wikipedia and make that information available on the web. For each
entity retrieved from Wikipedia, BDpedia defines a globally unique identifier that can
be dereferenced according to Linked Data principles (Berners-Lee, 2006). Then they
publish RDF links pointing from DBpedia into other Web data sources. The DBpedia
covers a wide range of topics such as geographic information, people, companies,
films, music, genes, drugs, books and scientific publications.
Friend of a Friend (FOAF) vocabulary32: Defines terms for describing persons, their
activities and their relations to other people and object. The FOAF terms are grouped
into three specific categories:
o Core – classes, and properties that form the core of FOAF;
o Social Web – terms for describing Internet accounts, address books and other
Web-bases activities;
o Linked Data Utilities – set of terms that are useful to the Web community.
Australian Government Locator Service (AGLS) Metadata Standard33 - Is a metadata
schema created by the Australian Governments and is primarily concern with
describing government services and information resources for discovery and retrieval
purposes, although its further development aims to facilitate the transaction of
government business online (McKemmish, Acland, Ward, & Reed, 1999).
30 http://schema.org/ 31 http://wiki.dbpedia.org 32 http://xmlns.com/foaf/spec/ 33 http://www.agls.gov.au/
28
The use of international standards when we are talking about interoperability is fundamental. Many
organisations create standards for the community to adopt. Next, some examples are ilustrated:
The Internet Engineering Task Force (IETF)34: In their site, is stated that their mission is to
“make the Internet work better”. They do that with the development of documents that
include protocol standards, best current practices, and informational documents of various
kinds;
International Organization for Standardization (ISO)35: ISO is an independent, non-
governmental international organisation that develops standards for a broad range of
domains. Currently, they have published more than 21000 international standards;
World Wide Web Consortium (W3C)36: An organisation founded in 1994 by Tim Berners-
Lee that is the main international standards organisation for the Web. The W3C standard
formation process consists of four maturity levels: (i) Working Draft (WD), the very first step
to standardisation and is published for review by the community; (ii) Candidate
Recommendation (CR), is more mature than the WD and the group responsible for the
standard is satisfied that the standard meets its goal; (iii) Proposed Recommendation (PR),
this standard has passes the prior two levels and the user of the standard provide input
about their experience using it. At this stage, the document is submitted to the W3C
Advisory Council for final approval and (iv) W3C Recommendation (WR) the most mature
stage of development and its ready for the community to adopt it.
In the Semantic Web, the metadata schemas and standards allow some level of interoperability
between different agents that are communicating. This schemas and standards work similarly,
in some way or form, to the existing protocols in the Web. Nevertheless, they are not enough
to respond to the complex system that is the Semantic Web. Another way to favour
interoperability in the Semantic Web is with the use of Metadata Application Profiles.
2.3.2 Metadata Application Profile
The remark about the Semantic Web is that one can mix and match vocabularies to fit their current
needs. For example, in RDF code, one could use the DCTERMS to describe the more generic
features about some resource and combine it with FOAF to describe the relations between people.
34 https://www.ietf.org/ 35 http://www.iso.org/iso/home.html 36 https://www.w3.org/
29
Also, if any vocabulary that fits the requested needs cannot be found, one of the remarkable
features that the Semantic Web allows, is to create one. In this situation, one could add to the
DCTERMS or FOAF, another metadata schema that fits the requirements. This is only possible
because similarly to the Web, the Semantic Web is distributed in nature and this means that we
can say things about things even if those things were not created by us.
DCMI developed the Dublin Core Abstract Model (DCAM). The DCAM defines the nature of the
components used and describes how those elements are combined to create information
structures (Powell, Nilsson, & Naeve, 2007). One of these data structures is the Dublin Core
Application Profile (DCAP), and it is described in the Singapore Framework for Dublin Core
Application Profiles, that is currently a DCMI Recommended Resource (Nilsson, Baker, & Johnston,
2008). From this point on, the term Metadata Application Profile (MAP) is going to be used instead
of DCAP. The use of DCAP instead of MAP can lead to misinterpretation about what really is a
DCAP. The name, Dublin Core Application Profile can make a reader think that we are talking about
an application profile that uses Dublin Core elements and, that is not the case, a DCAP or MAP
can use a variety of elements and it is not restricted to Dublin Core terms.
The Singapore Framework is used to design metadata applications for maximum
interoperability and reusability. According to DCMI, there is a set of rules that need to be followed
to develop a MAP:
Functional Requirements: The functional requirements are a mandatory step and it
describes the functions that the Application Profile (AP) will support;
Domain Model: The domain model it is also a mandatory step. This model describes
the entities of the AP and the relation between them;
Description Set Profile (DSP): This is another mandatory step. The aim of this phase
is to evaluate which metadata records are valid to the AP that is being developed;
User guidelines: the optional usage guidelines describe how to apply the AP;
Encoding syntax guidelines: another optional step that defines the profile syntax.
The development of a MAP is a stepwise process where each stage is built from the results of the
previous step. This can be verified through Figure 11 which present the MAP model defined by
DCMI.
The Singapore Framework came to organise a community of PA implementers, explaining
the steps that they should follow to reach the higher level of interoperability.
30
Although the MAP is a crucial resource, organisations need data models that support their
interoperability needs, and because of the complexity of the design, development, and
implementation, this process requires methodological support. The development method for
Metadata Application Profile (Me4MAP) was developed to answer this need. This method is based
on the Singapore Framework (Nilsson et al., 2008) and has as starting point the Rational Unified
Process (RUP) (Kruchten, 2004), a software development process.
Each component of the Singapore Framework is defined on Me4MAP. Using RUP as a
basis, Me4MAP sets the way through the MAP development: it establishes the activities, when they
should take place, how they interconnect, and which deliverables they will bring out (Malta &
Baptista, 2013).
Me4MAP is divided into four phases (see figure 12): Scope Definition, Construction,
Development, and Validation. The phases are transversal to the project development. (Malta &
Baptista, 2013).
Figure 11 - MAP Model applied with the Singapore Framework (Nilsson et al., 2008)
31
Figure 12 - Me4MAP V0.2 phases (Malta & Baptista, 2013)
2.3.3 From Open Data to Linked Open Data
To fully benefit from Open Data, it is critical to put information and data into a context that creates
new knowledge and empowers great services and applications. As LOD facilitates innovation and
knowledge creation from interlinked data, it is an important mechanism for information
management and integration on the Web (Bauer & Kaltenböck, 2012).
The path to transform Open Data into Linked Open Data is best described by Sir Tim
Berners-Lee on its website called 5-star Open Data37.
Figure 13 - 5 Star Open Data Model
37 http://5stardata.info/en/
32
Michael Hausenblas38 adapted the model to explain the cost and benefits for both publishers and
consumers of LOD:
Figure 14 - Cost and benefits of 1-star web data (Bauer & Kaltenböck, 2012)
Figure 15 - Cost and benefits of 2-stars web data (Bauer & Kaltenböck, 2012)
38 http://semanticweb.org/wiki/Michael_Hausenblas
33
Figure 16 - Cost and benefits of 3-stars web data (Bauer & Kaltenböck, 2012)
Figure 17 - Cost and benefits of 4-stars web data (Bauer & Kaltenböck, 2012)
34
Figure 18 - Cost and benefits of 5-stars web data (Bauer & Kaltenböck, 2012)
Another benefit that it is worthy of mentioning is the return from linking data. LOD is about linking
and the semantic value of things. With LOD we can have a global decentralised system, a system
that is capable of creating new relations between subjects, even if those relations are not explicit.
From this, new knowledge can be discovered, relations that we never thought about are now clear
as water. This is one of the benefits of LOD that has not yet been completely explored but, one
could believe that the future walks in this direction.
2.3.4 LOD and Smart Cities
Smart Cities are becoming more and more popular, and each city deploys its own system, which
may not be interoperable with another’s city system. Currently, there is no unified and interoperable
system which could be reused and redeployed in future smart cities (Gyrard & Serrano, 2015).
Since cities have large amounts of data, heterogeneous in nature and with different quality and
security requirements, research on the opening process, data re-engineering, linking, formalisation
and consumption are of primary interest in smart cities (Alani et al., 2007). The semantic web
technologies could be used to address some of these challenges.
The Internet of Things (IoT) is considered an essential aspect of Smart Cities, one could
also say that LOD together with IoT is a key requirement for Smart Cities. If the government and
other organisations would open up their datasets, particular their sensor networks over the Internet
and using LOD, this could indeed “enable” Smart Cities. Sensors expose their data in a structured
35
way and (potentially) linked to other datasets. These two technologies together would facilitate data
integration from multiple heterogeneous sources, enable the development of information filtering
systems, support knowledge discovery tasks and this could result in applications that are not yet
foreseen (Consoli et al., 2014).
In a Smart City, there is a variety of data sources that offer data to be processed. This data
can be about everything that makes a city such as transports, education, health care, crime,
economy and, energy and energy efficiency of buildings like the data from the SusCity project. The
heterogeneity in this data gives rise to challenges in several dimensions. Even devices of the same
type will deliver data in heterogeneous formats or different units of measurements bringing
heterogeneity issues (Bischof et al., 2014). These issues are part of the challenges that need to be
considered when city data is intended to be released to the public. Usually, the data retrieved from
the sensors has to be treated to be release to city’s citizens. Raw data in this approach, may not
be the best approach because of its volume and privacy issues. Aggregated and filtered data can
be more easily discoverable and have more value to the final user. That data is eventually integrated
into higher-level services and applications (Barnaghi, Sheth, & Henson, 2013).
2.3.5 LOD and Smart Cities Initiatives
The focus of a Smart City is urban resource efficiency. LOD can be used to better understand the
use of those resources and to suggest improvements. To do that, the data has to be published
under a LOD perspective and has to be available for everyone’s use. In the following section, some
initiatives regarding LOD and Smart Cities are presented to better understand the relation between
these two subjects.
2.3.5.1 IES CITIES39 40
The IES CITIES is a platform designed to facilitate the development of urban apps that exploit public
data offered by councils and enriched by citizens. This solution address the needs of three main
stakeholders in a city: (i) citizens consuming useful data services in different domains but also
contributing with complementary data to the city, (ii) companies leveraging the JSON-based RESTful
API provided by the IES CITIES to create citizen-centric Linked Data urban apps, and (iii) the City
39 http://iescities.eu/ 40 https://iescities.com/IESCities/
36
Council, using the platform to publicise its urban datasets and track services assembled around
them (López-de-Ipiña, Vanhecke, Peña, De Nies, & Mannens, 2013).
Currently, the cities that are using this platform are Zaragoza, Bristol, Roveroto and
Majadahonda.
The IES CITIES is composed of the following elements: (i) a mobile application, namely the IES
CITIES players, which allows the user to search for available urban apps, based on their location
and filters, (ii) a server that acts as a mediator among urban apps front-ends and back-ends.
The IES CITIES platform works as follows. First, the municipality registers with the IES Cities
server its datasets descriptions. It indicates where the dataset can be located and accessed (URI),
what is the original format of the data (CSV, RDF, XML, and others) and a description of the dataset
expressed in JSON-Schema. Secondly, a developer can find and choose which datasets are of his
interest by browsing the IES Cities dataset repository. The search can be done with the RESTful
JSON-based API, which issues queries over the datasets (JSON data structures) and retrieves
results also in JSON. Thirdly, the developer registers the application in the platform, providing
details such as where the application is (URI), its type (Google play, Local app repository, Web, and
others.) and a description of its functionalities. Finally, end users can search and find the
application.
One important point about the IES CITIES project is that involves citizens in the creation of LOD
which demonstrates how citizens' involvement in the management of a city can be increased by
allowing them to participate actively. These services also show the IES Cities platform’s added
value for public bodies, which can publish open data with a common machine-readable format
(JSON).
37
Figure 19 - Software Components architecture of the IES Cities Platform
In figure 20 it is possible to see the IES Cities from a citizen’s perspective. The Citizen can access
City Datasets (point 1) City applications (point 2). When an application is selected the user can see
its details (point 3) and, finally there is a personal area where the user can register (point 4).
Figure 20 - IES Cities from a citizen’s perspective
Figures 21 and 22 illustrate the IES Cities platform from a municipality perspective. In figure 21
the municipality can manage the users of the platform. This user can be Admin or Developers.
38
Figure 21 - IES Cities user’s manager
Figure 22 shows the process of adding a new dataset. This process can be done by the council
admin.
Figure 22 - IES Cities from the municipally perspective
A developer, in the IES Cities platform, can use the datasets that were published by the city council
to develop an application.
39
Figure 23 - IES Cities from a developer perspective
After creating an App the developer can add the app to the platform.
Figure 24 - IES Cities from a developer perspective
2.3.5.2 STAR-CITY41
Star-City integrates sensor data using a variety of formats and volumes. Star-City is a system that
supports traffic analytics and reasoning for cities with the integration of sensor data. This system,
showcased in Dublin, Ireland, demonstrates how the severity of road traffic congestion can be
smoothly analysed, diagnosed, explored and predicted using semantic web technologies.
This system is used on a daily and real-time basis to understand not only how traffic condition
is evolving over time, but also why.
Star-City completely relies on the W3C Semantic Web stack, e.g., OWL 2 and RDF for
representing the semantics of information and delivering inference outcomes (Lécué et al., 2014).
41 http://researcher.watson.ibm.com/researcher/view_group.php?id=5101
40
Figure 25 - Star City Application
3 Study to be Carried Out
In this chapter, it is presented the conceptualization of the problem of this dissertation and its
underlying objectives and the methodological approach that is going to be used.
3.1 Conceptualization of the problem to be studied and its underlying objectives
SusCity is a project that relies on the concept of smart cities, and many of the data that produces
are about energy and energy efficiency in buildings. A smart city has many strands, and the same
happens with SusCity. It is divided into six WPs, each containing a set of tasks. This dissertation is
going to focus on WP 2 and the task 6 “Publish City Open Data”. In this task, we are going to
research and develop methods for selecting and making data available under a LOD perspective.
This study aims to create an application profile for the SusCity data and to publish the
SusCity under a LOD perspective. This data is about energy and energy efficiency in buildings. To
do this, a research work was carried out, prior to this dissertation, to investigate if Application
Profiles, Properties and Vocabularies regarding energy efficiency in buildings existed. The
conclusion of this study was that such resources were almost non-existing. Hence, the opportunity
41
of this work is to create and publish them on the web for everyone’s use. With this, the aim is to
contribute to the LOD community and to promote semantic interoperability.
To do this work the Me4MAP methodology is going to be used. Certain phases of this
methodology are not going to be implemented because we already know the data that we are going
to deal with.
At the end of the project we aim that the SusCity data is going to be release in LOD for
everyone’s use.
To publish LOD data, the first thing that it is needed to be done is to look for Application
Profiles, Properties and Vocabularies that “fit” that data, in this case, data about energy and energy
efficiency in buildings. For this, a study was conducted, prior to this dissertation, where it was
concluded that these resources do not exist in the required number or do not exist at all.
The lack of Applications Profiles, Properties and Vocabularies about energy and energy
efficiency in buildings, are both the problem and the motivation of this dissertation because this
means that they are going to have to be created. The development of these resources is an
opportunity to contribute to the Semantic Web community by enriching its domain and to make
them available so, when other people want to make their data about energy or energy efficiency in
buildings available under LOD, they can rely on the properties and vocabularies that we have
created. . Another opportunity of this thesis is to improve and promote semantic interoperability.
Because of the creation of Application Profiles, Properties and Vocabularies, computers will now
have the ability to understand the semantic values and semantics of the attributes of the properties
and the relations between different datasets used in energy-related data.
In the creation of Properties and Vocabularies, one should use the technologies that the
Semantic Web relies on such as, but not limited to (i) RDF, that is a framework for expressing
information about resources; (ii) RDFS that took the basic RDF specification and extended it to
support the expression of structured vocabularies (Brickley & Guha, 2014); and (iii) SKOS. Both
RDFS and SKOS are applications of RDF. These techniques are going to be used not only in the
data but also in the data catalogue (MAP).
With all of this in mind, this thesis aims to:
Make the SusCity data available under a LOD perspective;
Develop of a Benchmarking report of LOD platforms;
Develop of an Application Profile for the SusCity data;
Define and implement the SusCity LOD prototype.
42
3.2 Methodological Approach
Currently, there are two different types of research methodologies, quantitative and qualitative.
Quantitative methodologies were originally developed to study phenomena. A phenomenon is an
observable event. These events can be laboratory experiments or numerical methods such as
mathematical modelling or surveys (Myers, 1997). Quantitative techniques collect data through
instruments, where they can perform tests and measurements, or gather information in a checklist.
Only a very limited number of situations can be studied under laboratory conditions,
because of the difficulty to reproduce “real world” situations. For this reason, the Social Sciences
presented the qualitative methodologies. Examples of qualitative methodologies are case studies,
Ethnography, ground theory or Design Science Research (DSR). The techniques used in this type
of methodologies are, for instance, participant observation, interviews, or analysis of documents or
texts (Myers, 1997).
A research work can use both qualitative and quantitative data collection techniques.
Recognising that all these methods, qualitative or quantitative, have limitations, is one way to
neutralise them to perform the triangulation of data. This triangulation consists in combining
various techniques of information gathering in a single research work. The best way to use both
types of techniques is to perform triangulation at the data source (Creswell, 2013).
Me4MAP
The Me4MAP is a development methodology and is the most adequate to the work that is going to
be developed in this dissertation because, the investigation part of this dissertation exists but, is
very small compared to the development part. The investigation part consists in searching of
properties and vocabularies on the Web that fit the needs of the SusCity data and the development
part consist in actually publish the SusCity data under a LOD perspective.
The Me4MAP is the method that is going to be used for this dissertation. The Me4MAP
was developed under the Design Science Research (DSR) methodological approach and, has as
starting points the Singapore framework for MAP and RUP. The aim of this methodology is to
establish a way for the development of a MAP. It institutes which are the activities, when they
should take place, how they interconnect and, which are the deliverables expected; it also makes
suggestions about which techniques should be used to develop these deliverables.
43
Me4MAP is divided into four phases (see fig.22): Scope Definition, Construction,
Development, and Validation. The phases are transversal to the project development. (Malta &
Baptista, 2013).
Figure 26 - Me4MAP phases (Malta & Baptista, 2013)
The planning of the work begins in the first phase “Scope Definition” and, the goal of this phase is
to define de MAP application scope and organise the work. Some part of the Functional
Requirements Singapore Stage 1 is developed in the phase. In the Construction phase, the Domain
Model Singapore Stage 2 development is initiated. Next, in the Description Set Profile (DSP)
Singapore Stage 3 is built in the Development Phase. The development phase is the climax off all
construction done until this moment since de DSP Singapore Stage 3 development work is based
on the Domain Model Singapore Stage 2, and it is the Singapore Stage 3 that defines the MAP.
Finally, in the Validation phase, the MAP is validated. The Guidelines, Singapore stage four and five
are developed throughout the last three steps (Construction, Development and Validation) (Malta
& Baptista, 2013). To have a better understanding of the life-cycle development model for
Me4MAP, see figure 27.
44
Figure 27 - Me2MAP V0.2 life-cycle development model (Malta & Baptista, 2013)
For this dissertation, the functional requirements phase is going to be skipped because
there aren’t any functional requirements defined. Because of this, the work is going to start by de
Domain Model (Singapore Stage 2). This model is used to describe what things the metadata
describes, and the relationships between them and, it is considered to be the blueprint for the
construction of the AP (Coyle & Baker, 2009). To design the Domain Model, the available data
from the project’s partners is going to be analysed, to gain a better understanding of what are the
entities and the relation between them.
45
Figure 28 - Example of a Domain Model (Coyle & Baker, 2009)
After the Domain Model, the next phase is Singapore Stage 3, the Description Set Profile.
The Me4MAP defines the need to develop two mandatory tasks, the Integration Dossier and the
Validation Dossier.
The Integration Dossier is composed by three deliverables (all mandatory): A Detailed Data
Model Diagram, that represents the Domain Model in detail; a State Of The Art Report, that reports
the existing metadata schemas, in this case about energy and energy efficiency in buildings; and
a Document of Integration, this document is built having in mind the detailed domain model and
the state of the art report. In this deliverable, it is shown every attribute, and its constraints
described by the properties of the metadata schemas chosen to represent the project data.
Similarly to the Integration Dossier, the Validation Dossier is also composed by three
mandatory deliverables, a Validation Report, that is developed having in mind the Vision of the
project; a Document of Validation, in which each element of the metadata is populated with the
data that corresponds to the resource and, finally a Questionnaire that aims verify if there is any
data that has no description from the MAP.
The Singapore Stages four and five consists in the development of the Usage Guidelines
and the Syntax Guidelines. The Usage Guidelines aim to respond to the “how” and “why” of the
AP. Ideally, they explain each property and anticipate the decisions that must be made while
creating a metadata record (Coyle & Baker, 2009). The Syntax Guidelines describe “any application
profile-specific syntaxes and/or syntax guidelines” (Coyle & Baker, 2009). The Singapore Stages
46
four and five are optional, but they are going to be developed in this project. They help the future
users of the AP to understand it better and to apply it in an expected way.
3.3 Activities Plan
The goal of this master’s thesis is to create an application profile for the SusCity data and to release the SusCity’s data to the public under a LOD perspective. To achieve this goal, the following tasks will be carried out:
T1. Development of State of the Art report on LOD under the scope of Smart Cities; T2. Study and analysis of fundamentals and technical recommendations of international
organisations under the scope of LOD; T3. Benchmarking report of LOD platforms (results in R1);
o Search for available LOD platforms; o Analyse which platforms are used by the LOD community; o See which LOD platforms are open; o See which platforms support LOD.
T4. Application Profile (results in R2); o Develop the Domain Model; o Develop the Description Set Profile; o Develop the Usage and Syntax Guidelines.
T5. Development and implementation of the SusCity LOD prototype (results in R3); o Implement the prototype in a LOD database and with SPARQL endpoint; o Conduct Laboratory tests;
T6. Writing of the dissertation (results in R4); T7. Writing and submission of a Scientific Article (results in R5).
The following results are expected from the above tasks:
R1. Benchmarking Report: R2. SusCity’s Application Profile: R3. SusCity LOD prototype: R4. Dissertation report. R5. Scientific Article.
This dissertation started in September 2016, and it will be completed in October 2017. In the following table, the deliveries dates are established for the project’s tasks.
47
November January March May July October T1. Development of State of the Art report on LOD under the scope of Smart Cities
X
T2. Study and analysis of fundamentals and technical recommendations of international organisations under the scope of LOD
X
T3. Benchmarking report of LOD platforms
X
T4. Application Profile X T5. Development and implementation of the SusCity LOD prototype
X
T6. Writing of the dissertation X X X X X X T7. Writing and submission of a Scientific Article
X X X X X X
Table 1 - Delivery dates for the dissertation
48
4 Final Considerations
Until now, the work was developed has intended. The development of the State of the Art about
LOD under the scope of Smart Cities and the study and analysis of the technical specification of
international organisations under LOD are completed. The next immediate task is to carry a
Benchmarking report about LOD platforms and continue to follow the activities plan.
49
5 References
1. Alani, H., Dupplaw, D., Sheridan, J., O’Hara, K., Darlington, J., Shadbolt, N., & Tullo, C.
(2007). Unlocking the potential of public sector information with semantic web technology.
Em The semantic web (pp. 708–721). Springer. Obtido de
http://link.springer.com/chapter/10.1007/978-3-540-76298-0_51
2. Alemu, G., Stevens, B., & Ross, P. (2012). Towards a conceptual framework for user‐driven
semantic metadata interoperability in digital libraries: A social constructivist approach.
New Library World, 113 (1/2), 38–54. https://doi.org/10.1108/03074801211199031
3. Antoniou, G., & Harmelen, F. van. (2004). Web Ontology Language: OWL. In P. D. S. Staab &
P. D. R. Studer (Eds.), Handbook on Ontologies (pp. 67–92). Springer Berlin Heidelberg.
https://doi.org/10.1007/978-3-540-24750-0_4
4. Baca, M. (2016). Introduction to Metadata: Third Edition. Getty Publications.
5. Barnaghi, P., Sheth, A., & Henson, C. (2013). From Data to Actionable Knowledge: Big Data
Challenges in the Web of Things [Guest Editors’ Introduction]. IEEE Intelligent Systems,
28(6), 6–11. https://doi.org/10.1109/MIS.2013.142
6. Batty, M., Axhausen, K. W., Giannotti, F., Pozdnoukhov, A., Bazzani, A., Wachowicz, M., …
Portugali, Y. (2012). Smart cities of the future. The European Physical Journal Special
Topics, 214(1), 481–518. https://doi.org/10.1140/epjst/e2012-01703-3
7. Bauer, F., & Kaltenböck, M. (2012). Linked Open Data: The Essentials. Edition Mono.
Obtained from http://www. semantic-web.at/LOD-TheEssentials.pdf
8. Berners-Lee, T. (2006, Julho 27). Linked Data - Design Issues. Obtained 24/01/2017, from
https://www.w3.org/DesignIssues/LinkedData.html
9. Berners-Lee, T., Hendler, J., Lassila, O., & others. (2001). The semantic web. Scientific
american, 284(5), 28–37.
50
10. Bischof, S., Karapantelakis, A., Nechifor, C.-S., Sheth, A. P., Mileo, A., & Barnaghi, P. (2014).
Semantic modelling of smart city data. Obtained from
http://corescholar.libraries.wright.edu/knoesis/572/
11. Bizer, C., Heath, T., & Berners-Lee, T. (2011). Linked Data - The Story So Far. International
Journal on Semantic Web and Information Systems.
12. Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., & Yergeau, F. (2008, November 26).
Extensible Markup Language (XML) 1.0 [W3C Recommendation]. Obtained 24/01/2017
13. Brickley, D., & Guha, R. . (2014, February 25). RDF Schema 1.1 [W3C Recommendation].
Obtained 24/01/2017, from https://www.w3.org/TR/rdf-schema/
14. Caragliu, A., Del Bo, C., & Nijkamp, P. (2011). Smart Cities in Europe. Journal of Urban
Technology, 18(2), 65–82. https://doi.org/10.1080/10630732.2011.601117
15. Chen, L., Zhang, H., Chen, Y., & Guo, W. (2012). Blank Nodes in RDF. Journal of Software,
7(9). https://doi.org/10.4304/jsw.7.9.1993-1999
16. Chourabi, H., Nam, T., Walker, S., Gil-Garcia, J. R., Mellouli, S., Nahon, K., … Scholl, H. J.
(2012). Understanding Smart Cities: An Integrative Framework. Em 2012 45th Hawaii
International Conference on System Science (HICSS) (pp. 2289–2297).
https://doi.org/10.1109/HICSS.2012.615
17. Cohn, J. P. (2008). Citizen Science: Can Volunteers Do Real Research? BioScience, 58(3),
192. https://doi.org/10.1641/B580303
18. Consoli, S., Mongiovì, M., Recupero, D., Peroni, S., Gangemi, A., Nuzzolese, A., & Presutti, V.
(2014). Producing Linked Data for Smart Cities: the case of Catania. Semantic Web
Journal. Obtained from http://www.semantic-web-journal.net/system/files/swj930.pdf
51
19. Coyle, K., & Baker, T. (2009, May 18). Guidelines for Dublin Core Application Profiles [This is
a DCMI Recommended Resource]. Obtained 04/10/2016, from
http://dublincore.org/documents/profile-guidelines/
20. Creswell, J. W. (2013). Research Design: Qualitative, Quantitative, and Mixed Methods
Approaches. SAGE Publications.
21. Cyganiak, R., Wood, D., Lanthaler, M., Klyne, G., Carroll, J. J., & McBride, B. (2014,
February 25). RDF 1.1 Concepts and Abstract Syntax [W3C Recommendation]. Obtained
25/01/2017, from https://www.w3.org/TR/rdf11-concepts/
22. EUROCITIES. (2013). Open Data guidebook.
23. Ferraram, A., Nikolov, A., & Scharffe, F. (2013). Data Linking for the Semantic Web. From
Semantic Web: Ontology and Knowledge Base Enabled Tools, Services and Application.
Information Science Reference.
24. Goodchild, M. F. (2007). Citizens as sensors: the world of volunteered geography.
GeoJournal, 69(4), 211–221. https://doi.org/10.1007/s10708-007-9111-y
25. Greenberg, J. (2005). Understanding Metadata and Metadata Schemes. Cataloging &
Classification Quarterly, 40(3–4), 17–36. https://doi.org/10.1300/J104v40n03_02
26. Greenberg, J., Sutton, S., & Campbell, D. G. (2003). Metadata: A Fundamental Component
of the Semantic Web. Bulletin of the American Society for Information Science and
Technology, 29(4), 16–18. https://doi.org/10.1002/bult.282
27. Gyrard, A., & Serrano, M. (2015). A Unified Semantic Engine for Internet of Things and Smart
Cities: From Sensor Data to End-Users Applications. From 2015 IEEE International
Conference on Data Science and Data Intensive Systems (pp. 718–725).
https://doi.org/10.1109/DSDIS.2015.59
52
28. Irwin, A. (1995). Citizen Science: A Study of People, Expertise and Sustainable Development.
Psychology Press.
29. Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, Adoption Barriers and Myths
of Open Data and Open Government. Information Systems Management, 29(4), 258–
268. https://doi.org/10.1080/10580530.2012.716740
30. Jessup, L. M., & Valacich, J. S. (2008). Information Systems Today: Managing in the Digital
World. Pearson Prentice Hall.
31. Jupp, S. (2010, Janeiro 22). Simple Knowledge Organisation System (SKOS). Obtained
23/02/2017, from http://ontogenesis.knowledgeblog.org/240
32. Klein, M. (2004, Novembro). Change Management for Distributed Ontologies. Obtained from
http://www.cs.vu.nl/~mcaklein/thesis/thesis.pdf
33. Kruchten, P. (2004). The Rational Unified Process: An Introduction. Addison-Wesley
Professional.
34. Lécué, F., Tallevi-Diotallevi, S., Hayes, J., Tucker, R., Bicer, V., Sbodio, M. L., & Tommasi, P.
(2014). STAR-CITY: semantic traffic analytics and reasoning for CITY (pp. 179–188).
ACM Press. https://doi.org/10.1145/2557500.2557537
35. López-de-Ipiña, D., Vanhecke, S., Peña, O., De Nies, T., & Mannens, E. (2013). Citizen-
Centric Linked Data Apps for Smart Cities. Em UCAmI (pp. 70–77). Springer. Obtained
from http://link.springer.com/content/pdf/10.1007%252F978-3-319-03176-
7.pdf#page=84
36. Malta, M. C. (2014, September 25). Methodological contribution to the development of
application profiles in the context of the semantic Web. Minho’s University. Obtained
from http://repositorium.sdum.uminho.pt/handle/1822/30262?mode=full
53
36. Malta, M. C., & Baptista, A. A. (2013). A method for the development of dublin core
application profiles (Me4DCAP V0. 2): detailed description. Em Proceedings of the 2013
International Conference on Dublin Core and Metadata Applications (pp. 90–103).
Lisbon: Citeseer. Obtained from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.671.405&rep=rep1&type=pd
f
37. Manola, F., Miller, E., McBride, B., Schreiber, G., & Yves, R. (2014, June 24). RDF 1.1
Primer [W3C Working Group Note]. Obtained 25/01/2017, from
https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/
38. Mathiesen, K. (2008). Access to information as a human right. Available at SSRN 1264666.
Obtained from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1264666
39. McKemmish, S., Acland, G., Ward, N., & Reed, B. (1999). Describing Records in Context in
the Continuum: the Australian Recordkeeping Metadata Schema. Archivaria, 48(Issue
Fall).
40. Miles, A., & Bechhofer, S. (2009). SKOS Simple Knowledge Organization System Reference.
Obtido Obtained 23/02/2017, from https://www.w3.org/TR/skos-reference/
41. Myers, M. D. (1997). Qualitative research in information systems. Management Information
Systems Quarterly, 21(2), 241–242.
42. Nam, T., & Pardo, T. A. (2011). Conceptualizing smart city with dimensions of technology,
people, and institutions. Em Proceedings of the 12th Annual International Digital
Government Research Conference: Digital Government Innovation in Challenging Times
(pp. 282–291). ACM. Obtained from http://dl.acm.org/citation.cfm?id=2037602
54
43. Nilsson, M., Baker, T., & Johnston, P. (2008, January 14). The Singapore Framework for
Dublin Core Application Profiles [This is a DCMI Recommended Resource]. Obtained
04/10/2016, from http://dublincore.org/documents/singapore-framework/
44. Niso, P. (2004). NISO Press Booklets.
45. Ossenbruggen, J. van, Hardman, L., & Rutledge, L. (2006). Hypermedia and the Semantic
Web: A Research Agenda. Journal of Digital Information, 3(1). Obtained from
https://journals.tdl.org/jodi/index.php/jodi/article/view/78
46. Pan, G., Qi, G., Zhang, W., Li, S., Wu, Z., & Yang, L. (2013). Trace analysis and mining for
smart cities: issues, methods, and applications. IEEE Communications Magazine, 121.
Obtained from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.307.9769&rep=rep1&type=
47. Pérez, J., Arenas, M., & Gutierrez, C. (2009). Semantics and complexity of SPARQL. ACM
Transactions on Database Systems (TODS), 34(3).
48. Powell, A., Nilsson, M., & Naeve, A. (2007, Junho 4). DCMI Abstract Model [This is a DCMI
Recommendation]. Obtained 06/10/2016, from
http://dublincore.org/documents/2007/06/04/abstract-model/
49. Ramalho, J. L. B. R. (2015). Smart Cities-Fazer uma Avaliação do Estado da Arte do Conceito
e Hierquizar, Com Base Numa Metodologia de Decisão, as Medidas a Implementar no
Território de Intervenção da Energaia. Obtained from https://repositorio-
aberto.up.pt/handle/10216/80465
50. Rühle, S., Baker, T., & Johnston, P. (2011, Junho 9). User Guide - DCMI_MediaWiki.
Obtained 27/01/2017, from http://wiki.dublincore.org/index.php/User_Guide
55
51. Sanchez, L., Muñoz, L., Galache, J. A., Sotres, P., Santana, J. R., Gutierrez, V., … Pfisterer,
D. (2014). SmartSantander: IoT experimentation over a smart city testbed. Computer
Networks, 61, 217–238. https://doi.org/10.1016/j.bjp.2013.12.020
52. Shadbolt, N., Berners-Lee, T., & Hall, W. (2006). The Semantic Web Revisited. IEEE
Intelligent Systems, 21(3), 96–101. https://doi.org/10.1109/MIS.2006.62
53. Staab, S., & Studer, R. (Eds.). (2009). Handbook on Ontologies. Berlin, Heidelberg: Springer
Berlin Heidelberg. https://doi.org/10.1007/978-3-540-92673-3
54. van der Aalst, W. M., & Stahl, C. (2011). Modeling business processes: a petri net-oriented
approach. MIT press.
55. Wang, D., Amin, M. T., Li, S., Abdelzaher, T., Kaplan, L., Gu, S., … Le, H. (2014). Using
humans as sensors: An estimation-theoretic perspective. Em IPSN-14 Proceedings of the
13th International Symposium on Information Processing in Sensor Networks (pp. 35–
46). https://doi.org/10.1109/IPSN.2014.6846739