real-time linked dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with...

19
Real-time Linked Dataspaces

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Real-time Linked Dataspaces

Page 2: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Edward Curry

Real-time Linked DataspacesEnabling Data Ecosystems for IntelligentSystems

Page 3: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Edward CurryNational University of Ireland GalwayGalway, Ireland

ISBN 978-3-030-29664-3 ISBN 978-3-030-29665-0 (eBook)https://doi.org/10.1007/978-3-030-29665-0

This book is an open access publication.

© The Editor(s) (if applicable) and The Author(s) 2020Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 InternationalLicense (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation,distribution and reproduction in any medium or format, as long as you give appropriate credit to theoriginal author(s) and the source, provide a link to the Creative Commons licence and indicate if changeswere made.The images or other third party material in this book are included in the book’s Creative Commons licence,unless indicated otherwise in a credit line to the material. If material is not included in the book’s CreativeCommons licence and your intended use is not permitted by statutory regulation or exceeds the permitteduse, you will need to obtain permission directly from the copyright holder.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors, and the editors are safe to assume that the advice and information in this bookare believed to be true and accurate at the date of publication. Neither the publisher nor the authors or theeditors give a warranty, expressed or implied, with respect to the material contained herein or for anyerrors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Page 4: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

To Meg, Liam, and Roisin

To Mum and Dad

Page 5: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Foreword

In Lewis Carroll’s Through the Looking-Glass, the Red Queen explained to Alicethe nature of Looking-Glass Land “Now, here, you see, it takes all the running youcan do, to keep in the same place.” Van Valen coined the “Red Queen” hypothesiswhere populations have to “run” an evolutionary race in order to stay in the sameplace, or else go extinct. Within our Digital Universe, we have a Digital Red Queen,with a race between our ability to create masses of data and our ability to manage iteffectively and efficiently.

Over the last quarter of a century, we have both run this race with our work in thecorporate worlds of Google and Verizon, to the ivory towers of MIT and theUniversity of Washington. We were both extremely interested in the problems ofdata integration at scale within ecosystems and have proposed approaches for doingso. Naturally, we felt more than a little curious to see what Ed and his team hadproduced.

In the decade since our original works on ecosystems and dataspaces, we haveseen new data management needs arise from the mass migration of applications frombatch processing paradigms to real-time processing. This sets the scene for the bookas it introduces “Real-time” dataspaces to enable data flows within ecosystems ofintelligent systems. Within these covers, you will find a technical vision, newtechniques, and deep insight for both theory and practice of dataspaces for real-time data. The book brings us on a journey from the lab to the field by developingnew pioneering best-effort techniques for real-time data management and validatestheir use within an excellent choice of an application domain, not just resourcemanagement but specifically sustainability. This body of work illustrates how thedataspace paradigm has evolved, and the transformative potential of leveraging dataecosystems to drive value within intelligent systems. The work goes beyond a purelytechnical perspective and exposes the critical social and organizational aspects ofmanaging data ecosystems for the collective benefit of the participants.

We are delighted to write this foreword to a book that will influence the thinkingon the design of data infrastructures as a key enabler of data ecosystems, intelligentsystems, and smart environments. It sets out a clear path for the design of data

vii

Page 6: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

platforms based on dataspaces with support for best-effort real-time data processingtechniques. This impressive body of work illustrates the power that data-drivensystems have to improve the sustainability of our planet’s complex ecosystems,both Natural and Digital.

MIT, Cambridge, MA, USA Michael L. BrodieFacebook AI, Menlo Park, CA, USAAugust 2019

Alon Halevy

viii Foreword

Page 7: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Preface

Around 2012 I started to investigate the potential of data-driven intelligent systemsfor sustainability. I was (and still am) very motivated by the potential of the Internetof Things (IoT), data analytics, and artificial intelligence to create intelligent systemsthat can contribute to a sustainable society. As a computer scientist with a back-ground in distributed systems and data management, I felt I could make a modestcontribution to the design and construction of these intelligent systems. There wassignificant potential for data-driven and artificial intelligence techniques to powerintelligent systems for sustainability. However, for these approaches to be viable,they would need to be cost-effective and deployable. Working with my industrialcollaborations, it was clear that a critical barrier to the adoption of intelligent systemswas, and still is, the high upfront costs associated with data sharing and integration.For decades we have seen the consequences of data silos within Enterprises withestimates of 50–80% of the costs of data projects going to data integration andpreparation activities. This limits large-scale data management projects to largeorganisations that have the necessary expertise and resources. This needs to changeif we want a broad effort for sustainability that enables smaller stakeholders toengage and leverage the value available in data.

Datafication driven by IoT-based digital infrastructure is leading to an ecosystemof data which can be exploited to transform our world. Typically, IoT data has themost value when it can be processed on-the-fly and with low-latency. However, thecurrent wave of datafication is leading to increasing “data silos.” In 2012, the IoTwas predicted to have 25 billion connected devices by 2020; current estimates arenow for over 75 billion connected devices by 2025. What was evident in 2012, and iseven more apparent today, is that we need a fundamental transformation in how wemanage the data ecosystem surrounding intelligent systems in smart environments.Traditional approaches to data management will not be sufficient. We need aparadigm shift as significant as the move to relational data management in the1970s to provide an alternative to the current top-down, centralised models of datamanagement.

In the first two decades of the twenty-first century, a recognition emerged amongresearchers and practitioners that a new class of information management and

ix

Page 8: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

processing systems was needed to support diverse distributed real-time applications.Michael Brodie has been a Prophet of the data integration challenges within eco-systems where thousands of semantically heterogeneous databases need to bemanaged and integrated collectively. These information ecosystems necessitate atransformation in how data is managed and shared among intelligent systems.Halevy, Franklin, and Maier recognised that in large-scale integration scenarios,involving thousands of data sources (such as ecosystems), it is difficult and expen-sive to obtain an upfront unifying schema across all sources. They introduced theparadigm of Dataspaces that shifts the emphasis to providing support for theco-existence of heterogeneous data that does not require a significant upfrontinvestment into a unifying schema. The concepts of ecosystems and dataspaceswere absorbing, and I was excited by the potential of “best-effort” approaches.I felt there might be a possible connection to the Pareto Principle.

The Pareto Principle (or the 80/20 rule) has wide application in many areas fromeconomics and market analysis to business strategy, where it has been observed that20% of the effort delivers 80% of the results. Within computer science, this principlehas been observed within many problems from fixing bugs to writing code. Theprinciple can help us to prioritise actions, for example, focus on the 20% of softwarebugs that cause 80% of the system crashes. The power of the principle has alwaysfascinated me, and the dataspaces paradigm can unlock its power within the datarealm. The pay-as-you-go model allows participants in the dataspace to focus onhigh-value data and tackle the “long tail” of data on an as-needed basis. This was thegenesis of this work.

This book explores the dataspace paradigm as an alternative best-effort approachto data management with data ecosystems. It establishes the theoretical foundationsand principles of Real-time Linked Dataspaces as a data platform for intelligentsystems, and introduces a set of specialised best-effort techniques and models toenable loose administrative proximity and semantic integration for managing andprocessing events and streams.

Readers of this book will gain a detailed understanding of how the dataspaceparadigm is used to enable data ecosystems for intelligent systems within smartenvironments. The reader is brought from establishing the fundamental theory andthe creation of new techniques needed for support services, to the experience gainedfrom delivering real-world intelligent systems for smart cities, buildings, energy,water, and mobility.

The book is of interest to three key audiences. First are researchers and graduatestudents in the fields of data management, big data, IoT, and intelligent systems withinterest in state-of-the-art techniques for approximate and best-effort approaches toincremental data management. Second, the book provides useful insights to practi-tioners that need to create advanced data management platforms for intelligentsystems, smart environments, and data ecosystems. Practitioners will learn aboutdesigning incremental data management architectures and techniques that aregrounded in theory and informed by the experience of rigorous deployments withinreal-world settings. Third, researchers and practitioners involved in interdisciplinary

x Preface

Page 9: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

and transdisciplinary “Smart” projects will gain insights on the design and operationof data-intensive socio-technical intelligent systems.

The book is structured as follows: Part I: Fundamentals and Concepts details themotivation and core concepts of Real-time Linked Dataspaces. This part establishesthe need for an evolution of data management techniques to meet the challenges ofenabling a data ecosystem for intelligent systems within smart environments. Itdetails the fundamental concepts of Dataspaces and the need for specialisation forprocessing dynamic real-time data. Part II: Data Support Services explores thedesign and evaluation of critical services within the Real-time Linked Dataspace,including catalog, entity management, query and search, data service discovery, andhuman-in-the-loop. Part III: Stream and Event Processing Services details thedesign and evaluation of the specialised techniques created for real-time supportservices including complex event processing, event service composition, streamdissemination, stream matching, and approximate semantic matching. Part IV:Intelligent Systems and Applications explores the use of Real-time LinkedDataspaces within real-world smart environments by demonstrating its role inenabling intelligent water and energy management systems through the developmentof IoT-enabled digital twins, enhanced user experience, and autonomic sourceselection for advanced predictive analytics. Finally, Part V: Future Directionsdetails research challenges for dataspaces, data ecosystems, and intelligent systems.

Forward-thinking societies will see the provision of digital infrastructure as ashared societal service in the same way as water, sanitation, and healthcare. With fewexceptions, our current large-scale data infrastructures are beyond the reach of smallorganisations who cannot deal with the complexity of data management and the highcosts associated with data infrastructure. It is clear we desperately need newapproaches to support the complex data ecosystems our “smart” society is creating.This vision demands a fundamental shift in how to design large-scale data ecosysteminfrastructure to unlock the power of a Pareto effect for data. I believe this book is astep in that direction.

Galway, Ireland Edward CurryOctober 2019

Preface xi

Page 10: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Acknowledgements

I want to thank all my fantastic students André, Feng, Souleiman, Umair, Wassim,Eanna, Ninad, and Willem who have shared this journey with me as we createdmany of the ideas and concepts presented in this book. It has been an honour for meto be their guide on the start of their scientific journey, and I am richer for theexperience. My collaborators have been a critical part of this work. I want to thankYongrui “Louie” Qin, Quan Z. “Michael” Sheng, and the project teams fromWaternomics and SENSE. In particular, I would like to thank Sean and Adeboyegafor their encouragement and helpful discussion over the years. I want to thankStefan, Manfred, Dietrich, and Mathieu for creating and nurturing a fantastic envi-ronment for ideas to germinate and blossom into life, and for always encouraging usto think big, be brave, and deliver impact.

I gratefully acknowledge the funders of this work; it was supported in part byScience Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, theEuropean Commission’s Seventh Framework Programme from ICT grant agreementWATERNOMICS no. 619660, Enterprise Ireland, the National Development Plan,and European Union under Grant Number IP/2012/0188.

I want to thank Niki, Asra, Piyush, Tarek, Atiya, Amin, and Edobor for theirsupport in the preparation of the final manuscript. Thanks also go to Ralf Gerstnerand all at Springer for their professionalism and assistance.

I want to thank my parents Edward and Mary, my sisters Tara and Alison, brotherJoe, and my parents-in-law Hilda and Roger. Finally, and most importantly I wouldlike to thank my wife, Meg, for her support, encouragement, and unwavering love;my son Liam for all the “Great Ideas” he gave his daddy, and to Roisin who bloomedinto a beautiful little rose as I wrote this book. You make everything worthwhile.

Galway, Ireland Edward Curry

xiii

Page 11: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

Contents

Part I Fundamentals and Concepts

1 Real-time Linked Dataspaces: A Data Platform for IntelligentSystems Within Internet of Things-Based Smart Environments . . . 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Intelligent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Smart Environments . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.4 Data Ecosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.5 Enabling Data Ecosystem for Intelligent Systems . . . . . 8

1.3 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Book Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Enabling Knowledge Flows in an Intelligent Systems DataEcosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Intelligent Systems Data Ecosystem . . . . . . . . . . . . . . . 172.2.2 System of Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.3 From Deterministic to Probabilistic Decisions

in Intelligent Systems . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.4 Digital Twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Knowledge Exchange Between Open Intelligent Systemsin Dynamic Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Knowledge Value Ecosystem (KVE) Framework . . . . . . . . . . . 242.5 Knowledge: Transfer and Translation . . . . . . . . . . . . . . . . . . . . 26

2.5.1 Entity-Centric Data Integration . . . . . . . . . . . . . . . . . . 272.5.2 Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

xv

Page 12: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

2.5.3 Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5.4 Smart Environment Example . . . . . . . . . . . . . . . . . . . . 31

2.6 Value: Continuous and Shared . . . . . . . . . . . . . . . . . . . . . . . . . 322.6.1 Value Disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.6.2 Data Network Effects . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.7 Ecosystem: Governance and Collaboration . . . . . . . . . . . . . . . . 342.7.1 From Ecology and Business to Data . . . . . . . . . . . . . . 342.7.2 The Web of Data: A Global Data Ecosystem . . . . . . . . 352.7.3 Ecosystem Coordination . . . . . . . . . . . . . . . . . . . . . . . 352.7.4 Data Ecosystem Design . . . . . . . . . . . . . . . . . . . . . . . 37

2.8 Iterative Boundary Crossing Process: Pay-As-You-Go . . . . . . . . 382.8.1 Dataspace Incremental Data Management . . . . . . . . . . 38

2.9 Data Platforms for Intelligent Systems Within IoT-BasedSmart Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.9.1 FAIR Data Principles . . . . . . . . . . . . . . . . . . . . . . . . . 392.9.2 Requirements Analysis . . . . . . . . . . . . . . . . . . . . . . . . 40

2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3 Dataspaces: Fundamentals, Principles, and Techniques . . . . . . . . . 453.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2 Big Data and the Long Tail of Data . . . . . . . . . . . . . . . . . . . . . 463.3 The Changing Cost of Data Management . . . . . . . . . . . . . . . . . 473.4 Approximate, Best-Effort, and “Good Enough” Information . . . . 493.5 Fundamentals of Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.5.1 Definition and Principles . . . . . . . . . . . . . . . . . . . . . . . 513.5.2 Comparison to Existing Approaches . . . . . . . . . . . . . . 53

3.6 Dataspace Support Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.6.1 Support Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.6.2 Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.6.3 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.7 Dataspace Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . 573.7.1 Query Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.7.2 Introspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.7.3 Reusing Human Attention . . . . . . . . . . . . . . . . . . . . . . 60

3.8 Dataspace Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . 603.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 Fundamentals of Real-time Linked Dataspaces . . . . . . . . . . . . . . . . 634.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2 Event and Stream Processing for the Internet of Things . . . . . . . 64

4.2.1 Timeliness and Real-time Processing . . . . . . . . . . . . . . 654.3 Fundamentals of Real-time Linked Dataspaces . . . . . . . . . . . . . 66

4.3.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.3.2 Definition and Principles . . . . . . . . . . . . . . . . . . . . . . . 684.3.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

xvi Contents

Page 13: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

4.4 A Principled Approach to Pay-As-You-Go DataManagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.4.1 TBL’s 5 Star Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.4.2 5 Star Pay-As-You-Go Model for Dataspace

Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.5 Support Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.5.1 Data Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.5.2 Stream and Event Processing Services . . . . . . . . . . . . . 74

4.6 Suitability as a Data Platform for Intelligent SystemsWithin IoT-Based Smart Environments . . . . . . . . . . . . . . . . . . . 764.6.1 Common Data Platform Requirements . . . . . . . . . . . . . 764.6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Part II Data Support Services

5 Data Support Services for Real-time Linked Dataspaces . . . . . . . . . 835.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2 Pay-As-You-Go Data Support Services for Real-time

Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3 5 Star Pay-As-You-Go Levels for Data Services . . . . . . . . . . . . 855.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6 Catalog and Entity Management Service for Internetof Things-Based Smart Environments . . . . . . . . . . . . . . . . . . . . . . . 896.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2 Working with Entity Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.3 Catalog and Entity Service Requirements for Real-time

Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.3.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . 916.3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.4 Analysis of Existing Data Catalogs . . . . . . . . . . . . . . . . . . . . . 936.5 Catalog Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.5.1 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . . 956.6 Entity Management Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.6.1 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . . 986.6.2 Entity Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.7 Access Control Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996.7.1 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . . 100

6.8 Joining the Real-time Linked Dataspace . . . . . . . . . . . . . . . . . . 1016.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7 Querying and Searching Heterogeneous Knowledge Graphsin Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.2 Querying and Searching in Real-time Linked Dataspaces . . . . . . 106

7.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . 1067.2.2 Knowledge Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Contents xvii

Page 14: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

7.2.3 Searching Versus Querying . . . . . . . . . . . . . . . . . . . . . 1087.2.4 Search and Query Service Pay-As-You-Go Service

Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097.3 Search and Query over Heterogeneous Data . . . . . . . . . . . . . . . 110

7.3.1 Data Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107.3.2 Motivational Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 1117.3.3 Core Requirements for Search and Query . . . . . . . . . . . 112

7.4 State-of-the-Art Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.4.1 Information Retrieval Approaches . . . . . . . . . . . . . . . . 1137.4.2 Natural Language Approaches . . . . . . . . . . . . . . . . . . . 1167.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.5 Design Features for Schema-Agnostic Queries . . . . . . . . . . . . . 1207.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8 Enhancing the Discovery of Internet of Things-Based DataServices in Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . 1258.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1258.2 Discovery of Data Services in Real-time Linked

Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1268.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . 1268.2.2 Data Service Discovery . . . . . . . . . . . . . . . . . . . . . . . 126

8.3 Semantic Approaches for Service Discovery . . . . . . . . . . . . . . . 1278.3.1 Inheritance Between OWL-S Services . . . . . . . . . . . . . 1288.3.2 Topic Extraction and Formal Concept Analysis . . . . . . 1288.3.3 Reasoning-Based Matching . . . . . . . . . . . . . . . . . . . . . 1298.3.4 Numerical Encoding of Ontological Concepts . . . . . . . 1298.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.4 Formal Concept Analysis for Organizing IoT Data ServiceDescriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1318.4.1 Definition: Formal Context . . . . . . . . . . . . . . . . . . . . . 1328.4.2 Definition: Formal Concept . . . . . . . . . . . . . . . . . . . . . 1328.4.3 Definition: Sub-concept Ordering . . . . . . . . . . . . . . . . 133

8.5 IoT-Enabled Smart Environment Use Case . . . . . . . . . . . . . . . . 1348.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 137

9 Human-in-the-Loop Tasks for Data Management, CitizenSensing, and Actuation in Smart Environments . . . . . . . . . . . . . . . 1399.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.2 The Wisdom of the Crowds . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9.2.1 Crowdsourcing Platform . . . . . . . . . . . . . . . . . . . . . . . 1419.3 Challenges of Enabling Crowdsourcing . . . . . . . . . . . . . . . . . . 1429.4 Approaches to Human-in-the-Loop . . . . . . . . . . . . . . . . . . . . . 144

9.4.1 Augmented Algorithms and Operators . . . . . . . . . . . . . 1449.4.2 Declarative Programming . . . . . . . . . . . . . . . . . . . . . . 1459.4.3 Generalised Stand-alone Platforms . . . . . . . . . . . . . . . 145

xviii Contents

Page 15: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

9.5 Comparison of Existing Approaches . . . . . . . . . . . . . . . . . . . . . 1469.6 Human Task Service for Real-time Linked Dataspaces . . . . . . . 148

9.6.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . 1489.6.2 Human Task Service . . . . . . . . . . . . . . . . . . . . . . . . . 1499.6.3 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . . 1499.6.4 Applications of Human Task Service . . . . . . . . . . . . . . 1509.6.5 Data Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . 1529.6.6 Task Data Model for Micro-tasks and Users . . . . . . . . . 1539.6.7 Spatial Task Assignment in Smart Environments . . . . . 154

9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Part III Stream and Event Processing Services

10 Stream and Event Processing Services for Real-time LinkedDataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16110.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16110.2 Pay-As-You-Go Services for Event and Stream

Processing in Real-time Linked Dataspaces . . . . . . . . . . . . . . . 16110.3 Entity-Centric Real-time Query Service . . . . . . . . . . . . . . . . . 164

10.3.1 Lambda Architecture . . . . . . . . . . . . . . . . . . . . . . . . 16410.3.2 Entity-Centric Real-time Query Service . . . . . . . . . . . 16510.3.3 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . 16610.3.4 Service Performance . . . . . . . . . . . . . . . . . . . . . . . . . 166

10.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

11 Quality of Service-Aware Complex Event Service Compositionin Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 16911.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16911.2 Complex Event Processing in Real-time Linked

Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17011.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . 17011.2.2 Complex Event Processing . . . . . . . . . . . . . . . . . . . . 17011.2.3 CEP Service Design . . . . . . . . . . . . . . . . . . . . . . . . . 17211.2.4 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . 17311.2.5 Event Service Life Cycle . . . . . . . . . . . . . . . . . . . . . 173

11.3 QoS Model and Aggregation Schema . . . . . . . . . . . . . . . . . . . 17511.3.1 QoS Properties of Event Services . . . . . . . . . . . . . . . 17511.3.2 QoS Aggregation and Utility Function . . . . . . . . . . . . 17611.3.3 Event QoS Utility Function . . . . . . . . . . . . . . . . . . . . 177

11.4 Genetic Algorithm for QoS-Aware Event ServiceComposition Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17811.4.1 Population Initialisation . . . . . . . . . . . . . . . . . . . . . . 17811.4.2 Genetic Encodings for Concrete Composition

Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17911.4.3 Crossover and Mutation Operations . . . . . . . . . . . . . . 179

Contents xix

Page 16: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

11.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18111.5.1 Part 1: Performance of the Genetic Algorithm . . . . . . 18111.5.2 Part 2: Validation of QoS Aggregation Rules . . . . . . . 186

11.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18911.6.1 QoS-Aware Service Composition . . . . . . . . . . . . . . . 18911.6.2 On-Demand Event/Stream Processing . . . . . . . . . . . . 189

11.7 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 190

12 Dissemination of Internet of Things Streams in a Real-timeLinked Dataspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19112.2 Internet of Things: A Dataspace Perspective . . . . . . . . . . . . . . 192

12.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . 19212.3 Stream Dissemination Service . . . . . . . . . . . . . . . . . . . . . . . . 193

12.3.1 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . 19412.4 Point-to-Point Linked Data Stream Dissemination . . . . . . . . . . 195

12.4.1 TP-Automata for Pattern Matching . . . . . . . . . . . . . . 19612.5 Linked Data Stream Dissemination via Wireless Broadcast . . . 197

12.5.1 The Mapping Between Triples and 3D Points . . . . . . . 19812.5.2 3D Hilbert Curve Index . . . . . . . . . . . . . . . . . . . . . . 198

12.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20112.6.1 Evaluation of Point-to-Point Linked Stream

Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20112.6.2 Evaluation on Linked Stream Dissemination

via Wireless Broadcast . . . . . . . . . . . . . . . . . . . . . . . 20412.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

12.7.1 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20512.7.2 Wireless Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . 206

12.8 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 207

13 Approximate Semantic Event Processing in Real-timeLinked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20913.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20913.2 Approximate Event Matching in Real-time Linked

Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20913.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . 21013.2.2 Event Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

13.3 The Approximate Semantic Matching Service . . . . . . . . . . . . . 21213.3.1 Pay-As-You-Go Service Levels . . . . . . . . . . . . . . . . . 21213.3.2 Semantic Matching Models . . . . . . . . . . . . . . . . . . . . 21313.3.3 Model I: The Approximate Event Matching

Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21313.3.4 Model II: The Thematic Event Matching Model . . . . . 215

xx Contents

Page 17: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

13.4 Elements for Approximate Semantic Matching of Events . . . . . 21513.4.1 Elm 1: Sub-symbolic Distributional Event

Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21513.4.2 Elm 2: Free Event Tagging . . . . . . . . . . . . . . . . . . . . 21613.4.3 Elm 3: Approximation . . . . . . . . . . . . . . . . . . . . . . . 21613.4.4 Elements Within the Event Flow Functional Model . . 216

13.5 Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21713.5.1 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21713.5.2 Subscriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21813.5.3 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

13.6 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 22013.6.1 Evaluation of the Approximate Semantic Event

Matching Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22013.6.2 Evaluation of the Thematic Event Matching Model . . . 221

13.7 State-of-the-Art Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22313.8 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Part IV Intelligent Systems and Applications

14 Enabling Intelligent Systems, Applications, and Analyticsfor Smart Environments Using Real-time Linked Dataspaces . . . . . 22914.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22914.2 Intelligent Energy and Water Management . . . . . . . . . . . . . . . 22914.3 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . . . . . . . . 23014.4 Smart Environment Pilot Deployments . . . . . . . . . . . . . . . . . . 231

14.4.1 Smart Airport (Linate, Milan) . . . . . . . . . . . . . . . . . . 23114.4.2 Smart Office (Galway, Ireland) . . . . . . . . . . . . . . . . . 23214.4.3 Smart Homes (Municipality of Thermi, Greece) . . . . . 23214.4.4 Mixed Use (Galway, Ireland) . . . . . . . . . . . . . . . . . . 23214.4.5 Smart School (Galway, Ireland) . . . . . . . . . . . . . . . . . 23314.4.6 Target Users Groups . . . . . . . . . . . . . . . . . . . . . . . . . 233

14.5 Enabling Intelligent Systems, Applications, and Analyticsfor Smart Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

14.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

15 Autonomic Source Selection for Real-time Predictive Analytics Usingthe Internet of Things and Open Data . . . . . . . . . . . . . . . . . . . . . . . 23715.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23715.2 Source Selection for Analytics in Dataspaces . . . . . . . . . . . . . 238

15.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . 23815.2.2 Internet of Things Source Selection Challenges . . . . . 239

15.3 Autonomic Source Selection Service for Real-timePredictive Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24015.3.1 Autonomic Source Selection . . . . . . . . . . . . . . . . . . . 24015.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24215.3.3 Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . 243

Contents xxi

Page 18: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

15.4 Autonomic Source Selection Workflow . . . . . . . . . . . . . . . . . 24515.4.1 4-Step Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 24515.4.2 Reselection Triggers . . . . . . . . . . . . . . . . . . . . . . . . . 247

15.5 Evaluation Within Intelligent Systems . . . . . . . . . . . . . . . . . . 24915.5.1 Wind Farm Energy Prediction (Belgium) . . . . . . . . . . 24915.5.2 Building Energy Prediction (Galway, Ireland) . . . . . . 251

15.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

16 Building Internet of Things-Enabled Digital Twins and IntelligentApplications Using a Real-time Linked Dataspace . . . . . . . . . . . . . . 25516.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25516.2 Digital Twins and Intelligent Applications with a Real-time

Linked Dataspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25616.2.1 Real-time Linked Dataspaces . . . . . . . . . . . . . . . . . . 25616.2.2 Digital Twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25616.2.3 The OODA Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

16.3 Enabling OODA for Digital Twins and IntelligentApplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25916.3.1 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25916.3.2 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26016.3.3 Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26016.3.4 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

16.4 Smart Energy and Water Pilots . . . . . . . . . . . . . . . . . . . . . . . . 26316.4.1 Energy and Water Savings . . . . . . . . . . . . . . . . . . . . 26416.4.2 Human Task Service Evaluation . . . . . . . . . . . . . . . . 265

16.5 Experiences and Lessons Learnt . . . . . . . . . . . . . . . . . . . . . . . 26816.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

17 A Model for Internet of Things Enhanced User Experiencein Smart Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27117.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27117.2 A Model for Internet of Things Enhanced User Experience . . . 272

17.2.1 Digitalisation: IoT and Big Data . . . . . . . . . . . . . . . . 27317.2.2 Human–Computer Interaction: IoT-Enhanced User

Experience and Behavioural Models . . . . . . . . . . . . . 27317.3 An IoT-Enhanced Journey for Smart Energy and Water . . . . . . 274

17.3.1 Digital: Real-time Linked Dataspace . . . . . . . . . . . . . 27417.3.2 HCI: A User’s Journey to Sustainability Using

the Transtheoretical Model . . . . . . . . . . . . . . . . . . . . 27517.4 TTM Intelligent Applications . . . . . . . . . . . . . . . . . . . . . . . . . 279

17.4.1 Promotional Homepage . . . . . . . . . . . . . . . . . . . . . . . 27917.4.2 Dashboard Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28117.4.3 Sense Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

17.5 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28817.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28917.5.2 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

xxii Contents

Page 19: Real-time Linked Dataspaces978-3-030-29665... · 2019-11-18 · platforms based on dataspaces with support for best-effort real-time data processing techniques. This impressive body

17.6 Insights and Experience Gained . . . . . . . . . . . . . . . . . . . . . . . 29117.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

Part V Future Directions

18 Future Research Directions for Dataspaces, Data Ecosystems, andIntelligent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29718.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29718.2 Dataspaces: From Proof-of-Concept to Widespread Adoption . . 29818.3 Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

18.3.1 Large-Scale Decentralised Support Services . . . . . . . . 29918.3.2 Multimedia/Knowledge-Intensive Event Processing . . . 30018.3.3 Trusted Data Sharing . . . . . . . . . . . . . . . . . . . . . . . . 30118.3.4 Ecosystem Governance and Economic Models . . . . . . 30218.3.5 Incremental Intelligent Systems Engineering: Cognitive

Adaptability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30318.3.6 Towards Human-Centric Systems . . . . . . . . . . . . . . . 303

18.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

Contents xxiii