a transformation from orm conceptual models to neo4j graph database

91
Radboud University Nijmegen Master Thesis A Transformation from ORM Conceptual Models to Neo4j Graph Database Author: Marios Braimniotis Supervisor: Dr. Th.P. van der Weide A thesis submitted in fulfilment of the requirements for the degree of Master of Science in the Information Sciences Institute of Computing and Information Sciences June 2017

Upload: others

Post on 11-Sep-2021

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Radboud University Nijmegen

Master Thesis

A Transformation from ORMConceptual Models to Neo4j Graph

Database

Author:Marios Braimniotis

Supervisor:Dr. Th.P. van der Weide

A thesis submitted in fulfilment of the requirementsfor the degree of Master of Science

in the

Information SciencesInstitute of Computing and Information Sciences

June 2017

Page 2: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Declaration of Authorship

I, Marios Braimniotis, declare that this thesis titled, ’A Transformation from ORMConceptual Models to Neo4j Graph Database’ and the work presented in it are my own.I confirm that:

This work was done wholly or mainly while in candidature for a master degree atRadboud University of Nijmegen.

Where any part of this thesis has previously been submitted for a degree or anyother qualification at this University or any other institution, this has been clearlystated.

Where I have consulted the published work of others, this is always clearly at-tributed.

Where I have quoted from the work of others, the source is always given. Withthe exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have madeclear exactly what was done by others and what I have contributed myself.

Signed:

Date:

i

Page 3: A Transformation from ORM Conceptual Models to Neo4j Graph Database

RADBOUD UNIVERSITY NIJMEGEN

AbstractFaculty of Science

Institute of Computing and Information Sciences

Master of Science

A Transformation from ORM Conceptual Models to Neo4j Graph Database

by Marios Braimniotis

An information system is best described first at the conceptual level, using a conceptualmodelling technique, and then is mapped into a target database model. Several con-ceptual modelling techniques are available, with the most well-known being the Entity-relationship(ER) modelling and the Object-Role Modelling(ORM) with its variationsNIAM and PSM. Relational databases have been the most popular data stores in thepast few decades, but the advent of Big Data introduced several contemporary databasetechnologies, which can be grouped in four major categories; key-value stores, wide-column stores, document stores and graph databases. This study discusses the trans-formation process from ORM conceptual models to graph databases, and introduces aframework for this transformation. Transformation rules for the most commonly usedORM information structures are defined, along with the corresponding transformationalgorithm that describes the transformation process.

Page 4: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Acknowledgements

First of all, I would like to express my profound gratitude to my thesis advisor Professor

Theo van der Weide of the Faculty of Science at Radboud University, Nijmegen. I would

like to thank him for his patience and his valuable advice during my research and writing.

He consistently steered me in the right direction whenever I ran into a trouble spot or

had questions about my research. This accomplishment would not have been possible

without him.

I would also like to acknowledge Dr. Patrick van Bommel of the Faculty of Science at

Radboud University, Nijmegen as the second reader of this thesis, and I am gratefully

indebted to him for his very valuable comments on this thesis.

Finally, I would like to thank my family for providing me with unfailing support and

continuous encouragement throughout my years of study and through the process of

researching and writing this thesis.

iii

Page 5: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Contents

Declaration of Authorship i

Abstract ii

Acknowledgements iii

Contents iv

List of Figures vii

List of Tables ix

Abbreviations x

1 Introduction 1

1.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Conceptual Models 5

2.1 Information Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Fundamentals of information structures . . . . . . . . . . . . . . . 5

2.1.2 Fact types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2.1 Unary fact types . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2.2 Binary fact types . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2.3 N-ary fact types . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2.4 Objectified fact types . . . . . . . . . . . . . . . . . . . . 8

2.1.2.5 Bridge types . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Specialization(Subtyping) . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.4 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.5 Power Types(or Set Types) . . . . . . . . . . . . . . . . . . . . . . 11

2.1.6 Sequence Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.7 Schema Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

iv

Page 6: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Contents v

2.2.1 Introduction to the Universe of Discource(UoD) . . . . . . . . . . . 12

2.2.2 Population rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 The Graph Data Model 16

3.1 Graph Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 The Neo4j Graph Database . . . . . . . . . . . . . . . . . . . . . . 16

3.1.2 The Labeled Property Graph Model . . . . . . . . . . . . . . . . . 17

3.1.3 A Comparison to Relational Data Model . . . . . . . . . . . . . . . 18

3.2 Data Modeling in Neo4j . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.1 Modeling Best Practices . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.2 Modeling Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.3 An Introduction to Cypher query Language . . . . . . . . . . . . . 23

4 Data Model Transformation 26

4.1 Transformation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Transformation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2.1 Simple Entity and Fact Types . . . . . . . . . . . . . . . . . . . . . 26

4.2.1.1 Bridge Types . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2.1.2 N-aries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1.3 Complex Identification . . . . . . . . . . . . . . . . . . . 31

4.2.2 Objectifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.3 Specializations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.4 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.5 Power Types(or Set Types) . . . . . . . . . . . . . . . . . . . . . . 38

4.2.6 Transformation Rules Summary . . . . . . . . . . . . . . . . . . . 39

4.3 Transformation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Validation 43

5.1 Validation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Example Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2.1 Presidential ORM . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2.2 Presidential Graph Meta-model . . . . . . . . . . . . . . . . . . . . 44

5.2.3 Cypher Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.3.1 Data Integrity Constraints . . . . . . . . . . . . . . . . . 46

5.2.3.2 Labelled Nodes . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.3.3 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3 Validation Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.1 Query 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.2 Query 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.3 Query 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.3.4 Query 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.3.5 Query 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3.6 Query 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3.7 Query 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.3.8 Query 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Page 7: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Contents vi

5.3.9 Query 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3.10 Query 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 Conclusions and Recommendations 57

A Graph Data 59

A.1 Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

A.2 President . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.3 Vice President . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

A.4 Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.5 Presidency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

A.6 Marriage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

A.7 Winner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A.8 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

A.9 Election . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A.10 Hobby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

A.11 Birth State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.12 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A.13 Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.14 Party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Bibliography 78

Page 8: A Transformation from ORM Conceptual Models to Neo4j Graph Database

List of Figures

2.1 Unary fact type example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Binary fact type example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Ternary fact type example. . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Objectification example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Bridge type example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6 Specialization example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.7 Specialization hierarchy example. . . . . . . . . . . . . . . . . . . . . . . . 10

2.8 Specialization hierarchy with connected subtype. . . . . . . . . . . . . . . 11

2.9 Generalization example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.10 Power type example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.11 Sequence type example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.12 Schema type example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Neo4j clustering architecture . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 A simple graph model example. . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Relational database for the data center domain. . . . . . . . . . . . . . . . 19

3.4 Graph database for the data center domain. . . . . . . . . . . . . . . . . . 20

3.5 Granulated and not granulated structure example. . . . . . . . . . . . . . 21

3.6 Unconnected graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.7 Node representing multiple concepts. . . . . . . . . . . . . . . . . . . . . . 22

3.8 Splitting node with multiple concepts. . . . . . . . . . . . . . . . . . . . . 22

3.9 A simple graph model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Simple entity types transformation. . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Simple entity types identification. . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Transformation of fact type with compound constraint. . . . . . . . . . . 28

4.4 Simple entity types transformation. . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Population violation in optional functional role. . . . . . . . . . . . . . . . 29

4.6 Optional functional role in graph model. . . . . . . . . . . . . . . . . . . . 29

4.7 Total functional role in graph model. . . . . . . . . . . . . . . . . . . . . . 29

4.8 ORM bridge type example. . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.9 Bridge type identification. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.10 Transformation of bridge type example. . . . . . . . . . . . . . . . . . . . 30

4.11 Ternary fact type transformation. . . . . . . . . . . . . . . . . . . . . . . . 31

4.12 Ternary fact type in graph model. . . . . . . . . . . . . . . . . . . . . . . 31

4.13 Ternary fact type in graph model. . . . . . . . . . . . . . . . . . . . . . . 31

4.14 ORM complex identification example. . . . . . . . . . . . . . . . . . . . . 32

4.15 ORM complex identification example. . . . . . . . . . . . . . . . . . . . . 32

vii

Page 9: A Transformation from ORM Conceptual Models to Neo4j Graph Database

List of Figures viii

4.16 Transformation of complex identification example. . . . . . . . . . . . . . 32

4.17 Objectification example in ORM. . . . . . . . . . . . . . . . . . . . . . . . 33

4.18 Objectification identification example. . . . . . . . . . . . . . . . . . . . . 33

4.19 Objectification in the graph model. . . . . . . . . . . . . . . . . . . . . . . 34

4.20 Objectification example transformation into graph model. . . . . . . . . . 34

4.21 Objectification example transformation with minimal redundancies. . . . 34

4.22 Specialization transformation example. . . . . . . . . . . . . . . . . . . . . 36

4.23 Specialization transformation by partition. . . . . . . . . . . . . . . . . . . 36

4.24 Specialization transformation by separation. . . . . . . . . . . . . . . . . . 36

4.25 Specialization transformation by absorption. . . . . . . . . . . . . . . . . . 36

4.26 Generalization transformation example. . . . . . . . . . . . . . . . . . . . 36

4.27 Generalization transformation by absorption. . . . . . . . . . . . . . . . . 37

4.28 Adjusted generalization orm model. . . . . . . . . . . . . . . . . . . . . . 37

4.29 Generalization transformation by partition. . . . . . . . . . . . . . . . . . 38

4.30 Generalization transformation by separation. . . . . . . . . . . . . . . . . 38

4.31 ORM power type example. . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.32 ORM power type example. . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 Presidential ORM model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Presidential Graph Meta-model. . . . . . . . . . . . . . . . . . . . . . . . 46

Page 10: A Transformation from ORM Conceptual Models to Neo4j Graph Database

List of Tables

4.1 Transformation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

A.1 Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

A.2 President . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.3 Vice President . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

A.4 Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.5 Presidency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A.6 Marriage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

A.7 Winner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A.8 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

A.9 Election . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A.10 Hobby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

A.11 Birth State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.12 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A.13 Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.14 Party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

ix

Page 11: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Abbreviations

ACID Atomicity Consistency Isolation Durability

NIAM Natural Language Information Analysis Method

NOSQL Not Only SQL

OLAP Online Analytical Processing

OLTP Online Transaction Processing

ORM Object Role Modeling

ORC Object Role Calculus

PSM Predicate Set Model

RDBMS Relational Database Management System

SQL Structured Query Language

CQL Cypher Query Language

UoD Universe of Discourse

x

Page 12: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 1

Introduction

Information systems development is bound with conceptual data modelling, in the man-ner that data should be modelled first on conceptual level and once on appropriateconceptual model is obtained, then we can proceed by transforming it into externalor internal level as it is prescribed by three level architecture for information systemsmodelling[66]. By following this approach, we can define two distinct sorts of efficiency;efficiency in terms of language correctness and efficiency in terms of query performance.In this thesis we will focus on the former, meaning that we will try to define on efficienttransformation framework from conceptual to internal level which will transform/trans-late conceptual model by maintaining language correctness.

1.1 Theoretical Background

Several conceptual data modelling techniques have been developed during past decades.Some of them are ER[1, 2], NIAM[3] and PSM[4] which is an extension of NIAM. PSMprovides advanced modelling constructs, such as generalization and specialization, powertypes and sequence types. All these constructs can be graphically represented in PSM,without falling short in semantics. Moreover PSM specifies several simple and complexintegrity constraints which can be represented in a graphical style too. In general, PSM isproved to be one of the most expressive conceptual data modelling techniques and this isthe reason that we will use it as conceptual data modelling technique in our informationframework. In three level architecture for information systems, we refer to internalmodels which in essence are database models that are hosted in database managementsystems. Normally, such models are bound with some database language. One of themost popular database models during the last few decades has been the well- knownrelational model[5] which is typically bound with the structured query language[6]. Someof the most popular implementations of relational or object-relational databases thatdominate the corresponding market are Oracle database, MySQL, Microsoft SQL andPostgreSQL(object-relational).

However, relational and object-relational databases are not always the most appropriatetechnology. The advent of Big Data concept was introduced several modern technologieswhich facilitate the needs of the new digital world which grows very fast and becomesmore complex in terms of Volume, Variety and Velocity, also known as 3-Vs of BigData. Volume and Velocity constitute quantitative metrics which refer to the size and

1

Page 13: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 1. Introduction 2

the pace of growth of the data that needs to be handled. Variety constitute a qual-itative metric which refers to the different types of data which comes from differentsources. All together compound the 3-Vs of Big Data, and they are the reason thatnew specialised technologies have been introduced, either to complement or to replacetraditional RDBMSs. The most well know category of relational database alternativesduring the last years is referred to as NoSQL. To give a more detailed description ofthis category, NoSQL stands for ”Not Only SQL” and refers to a group of databasesthat have been designed for large scale data storage and for high scalability across serverclusters. NoSQL translated as ”Not only SQL” refers to databases rather than the lan-guage that the presence of the word SQL may imply. NoSQL databases are classified infour types according to [7, 8]. These types are Key-Value stores, Wide-column stores,Document databases and Graph Databases.

Key-value stores are organising data by associating an identifier(key) with a simple, stan-dalone hash table; data searches on key-value stores are performed exclusively againstthe keys of the key-value pairs[7]. Wide-column stores are characterised by a column-oriented structure. In wide-column stores, data is organised in distributed columnswhere the data instances are grouped using a key identifier connected with multipleattributes[7]. Document databases store data in documents as it can be inferred by itsname. Each document is assigned a key value and it contains multiple key attribute-value pairs or nested documents. Documents are encoded in a standard data exchangeformat such as XML, JSON(Javascript Option Notation) or BSON(Binary JSON)[7].The last type of NoSQL technologies, and the one that we will study in this paper, isgraph databases type. Graph databases, unlike other NoSQL stores, are designed todeal with highly connected data. In graph databases, relational tables are replaced byinterconnected key-value pairings which are organised in relational graphs. These key-value pairings are presented as nodes and relationships are presented as edges, cratingan object-oriented network of data[7].

Turning to the languages that are used from the different technologies for data manip-ulation, most NoSQL technologies were not providing any declarative language such asSQL at the beginning, and as result all queries required to be expressed at the very lowlevel of abstraction[9]. Nevertheless, NoSQL databases has evolved during last yearsand adopted languages, either in the form of APIs or in the form of high level querylanguages, to facilitate requirements for data definition and manipulation.

1.2 Methodology

This paper addresses the problem of how to transform conceptual data models intograph database implementation environments. A conceptual data model consists ofan information structure along with a set of integrity constraints, which need to betranslated. The outcome of the transformation procedure will be a set of statementsthat can reproduce the database any time that they are run on a computer. In orderto provide a solid framework for this transformation, we will need to choose specificmodelling methods and to abide by the semantics and limitations of the correspondinglanguages. As source model we will use the PSM version of the ORM model and as targetdatabase, we will use one of the most well known Graph Databases, namely Neo4j, andits graph query language namely Cypher. Our transformation approach is captured intwo steps; first we transform our conceptual schema into an intermediate specification,

Page 14: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 1. Introduction 3

and then we use our transformation rules so as to create the statements that produceour database.

To validate the proposed transformation mechanism, we will use a methodology basedon the transitive principle in mathematics. We will use a use case database of the wellestablished transformation of the presidential ORM model into the presidential rela-tional database. Then, we will compare our target presidential graph database with therelational one by examining their populations using corresponding queries against them.If the queries results are equal, then we can claim that also the source conceptual modelequals to our target graph model, and so, our transformation framework is valid. For theimplementation and querying of the relational model of the presidential database, we willuse the well-known MySQL server and the supported Structured Query Language(SQL).

1.3 Research Questions

The research questions that this thesis aims to answer are the following:

Q1: Is it possible to implement a traditional structured data models using an envi-ronment which has been designed for unstructured and semi-structured data(NoSQLenvironments)?Q2: Can we transform the structured PSM conceptual model into a Graph databasemodel(semi-structured)?Q3: Is there ”one to one” correspondence between the PSM conceptual model and theGraph model?

1.4 Related Work

Much research has been done in the field of data model transformations, ranging from theclassification of the transformation methods[10] to specific use cases of transformationsfrom a particular model to another[11–15]. The concept of database model transfor-mation is highly related with the ANSI/SPARC three level architecture for databasemanagement systems. In [11], a transformation framework is provided for transforma-tions from conceptual data modelling techniques with an underlying object-role structureto internal level tree representation. A transformation from entity relationship modelto relational database is presented in [12], while a reverse use case is presented in [13].Another transformation use case that involves again the relational model is prescribedin [14]. In this paper, we focus on Graph databases as the internal level target transfor-mation, since there is lack of transformation use case references in literature. A relevanttransformation example is mentioned in [16], but it is not well established, in the sensethat the transformation process is not discussed in detail.

1.5 Thesis outline

This thesis paper is organised in six chapters. Chapter1 provides an introduction tothe theme of the study. The theory behind conceptual models and graph model isaddressed in more detail in Chapter2 and Chapter3 respectively. In Chapter4, the

Page 15: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 1. Introduction 4

proposed transformation framework is introduced, and the transformation rules thatcorrespond to each of the conceptual structures are discussed. The validation of theproposed transformation mechanism follows in Chapter5, and the paper closes withconclusion and recommendations for further research in Chapter6.

Page 16: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2

Conceptual Models

The goal of this chapter is to give an overview of conceptual models. The main prop-erties and components of conceptual models will be described in terms of the objectrole modeling(ORM) structure[17, 18]. ORM provides us with appropriate semantics inorder to model the so-called Universe of Discourse(UoD) at the conceptual level [19].It is a simple method that specifies the real world in terms of objects types and thecorresponding subtypes that play roles in fact types. In turn, fact types may consistof any number of roles, as well as, they can be treated as object types, the so-calledobjectified fact types. Then, all these objects abide by several constraints that ensureintegrity and consistency of the implemented database. In the following sections, con-ceptual models will be discussed in more detail, using the PSM(Predicator Set Model),which is an extension of NIAM(Natural language Information Analysis Method), andconforms to the axioms of the underlying ORM kernel[20]. A brief analysis of the com-ponents of information structures will conducted, based on the specifications describedin [4, 11, 17, 20–27]. More specifically, in 2.1, information structures will be introducedin a formal way. In 2.2, the population of information structures will be discussed, whileconstraints over the instances of information structures will be introduced in 2.3. In thelast part of this chapter 2.4, we will discuss the notion of identification and its role inthe design process.

2.1 Information Structures

2.1.1 Fundamentals of information structures

As it is already introduced, information structures I=(P,O,L,E ,F ,Base,Sub) can bedefined as the composition of the following components[4, 11, 17, 20–27]:

• A finite set P of predicators. Predicators connect an object type O with a facttype F . In order to define which object type a pradicator is associated with, weuse the function Base: P → O.

• A set O of object types. This set contains composed object types, also called facttypes, as well as atomic object types, which will be treated next.

5

Page 17: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 6

• A set of atomic object types A = O − F . Atomic object types can be eitherentity types E or label types L with the difference between the two types lyingon the ability of the later to be represented (reproduced) on a communicationmedium[20, 22, 25].

• A partition F of the set P, which contains the fact types. As it is previouslyreferred, fact types constitute the so-called composed object types, so they can beconsidered as object types. In order to find which fact type corresponds with aparticular predicator, we can use the function Fact: P → F , where Fact(p)= f⇐⇒ p ∈ f .

• The partial orders Spec and Gen that express specialization and generalizationrespectively.

• A function u:O → O, which returns the PaterFamilias of an object type.

• The sets G and S containing the power types and sequence types respectively, whichconstitute special classes of the set of object types G ⊆ O and S ⊆ O.

• A function Elt:G ∪ S → O, which returns the type of the elements that formulatepower types and sequence types.

• A set C containing the schema types, with C ⊆ O.

• A relation ≺⊆ C ×O, which refers to the decomposition of schema types.

Now that the basic components of the PSM have been introduced, we can move on tothe next sections by briefly discussing the basic conceptual constructs that may occurat the modelling phase of an information system.

2.1.2 Fact types

In this section, the fundamental data modelling concept of fact types, also referred to asrelationship types, will be treated. As a general definition, we could say that a fact typeis an association between object types. More specifically, fact types consists of the rolesthat object types play in a relationship and they are identified by sets of predicators.As predicator is defined the connection between an object type and its correspondingrole in a relationship. Furthermore, the number of roles that participate in a fact typedetermines the degree of this fact type. A fact type may be unary,binary or n-ary.

2.1.2.1 Unary fact types

Unary fact types consist of single roles, indicating that for a given object type, a subsetparticipates in the fact type. A simple example of a unary fact type is given in Figure2.1.The fact type Smokes contains a single role, and implies that there is a subset of smokingpersons.

Page 18: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 7

Figure 2.1: Unary fact type example.

2.1.2.2 Binary fact types

Binary fact types are those which associate exactly two predicators, and as result consistof two roles. An example of binary fact type is given in Figure2.2. Here, object typesEmployee and Department are playing roles, constituting the fact type WorksFor.

Figure 2.2: Binary fact type example.

2.1.2.3 N-ary fact types

As n-ary fact types, we define those which involve any number of predicators higher thantwo. This case is illustrated in Figure2.3, which presents an example of a ternary facttype.

Page 19: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 8

Figure 2.3: Ternary fact type example.

2.1.2.4 Objectified fact types

A special class of fact types is that of objectified fact types. A fact type is calledobjectified, and as result can be treated as object type, when it is the base of apredicator[11, 21, 22]. The predicators that have fact types as bases are called ob-jectifications and formulate the set H, with:

H=p ∈ P | Base(p) ∈ F

An information structure that contains an objectification is illustrated in Figure2.4.Again, it consists of two binary fact types F=f, g, but in this case, predicator r hasas Base an objectified fact type f . This means that the predicator r is an objectificationwith Base(r)=f and Fact(r)=g. As result, we have the set H=r.

Figure 2.4: Objectification example.

2.1.2.5 Bridge types

Another sort of fact types that worth mentioning is that of bridge types. As a bridgetype is defined a binary fact type that connects abstract object types to concrete ones.A fact type f is qualified as a bridge type(Bridge(f), and as result belongs in the set ofbridge types mathcalB, when:

∃p,q[f = p, q ∧Base(p) ∈ L ∧Base(q) /∈ L]

Page 20: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 9

The concept of separation between abstract and concrete object types is also indicatedby the rule that label types may only appear in bridge types, expressed as:

Base(p) ∈ L ⇒ Bridge(Fact(p))

Moreover, the operators concr and abstr can be used to extract the predicators thatcorrespond with the concrete and abstract part of a bridge type respectively[25]. Inthe example of Figure4.8, we can see the the connection between the concrete objecttypes Employee and Department, and the abstract object types(labels) Name andDepartmentname, through the bridge types [Employee] with [Name] and [Department]with [Departmentname] respectively.

Figure 2.5: Bridge type example.

2.1.3 Specialization(Subtyping)

In this section, the concept of specialization, which is also referred to as subtyping[11,21, 22, 24], will be discussed. Specialization constitutes a mechanism for representingone or more(possibly overlapping) subtypes of an object type[25, 27]. Specializationaccommodates instances of object types, for which certain properties are to be recorded.For instance, we can consider the case presented in Figure2.6, which describes a situa-tion with the subtype Adult being a specialization of Person, meaning that we have asubset of Persons with age greater or equal than 18, comprising the set of Adults.

Figure 2.6: Specialization example.

As it has already been introduced in 2.1.1, specialization is a partial order on objecttypes, and more specifically on non-label types(Spec⊆ ExO), with the constraint thateach element of E can be linked to a unique top object type. This top element is called

Page 21: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 10

pater familias and is given by the function u:E → E . Furthermore, the concept ofspecialization introduces the concept of the attached or type related predicators. Twopredicators p and q are type related if u(Base(p))=u(Base(q)) and denoted as p ∼ q.The abbreviation aSpecb is interpreted as: a is a specialization(subtype) of b. Thefollowing example in Figure2.7 contributes in better understanding of the notion ofspecialization(subtyping), and is originally presented in[22, 27]. In this example, wehave the following specialization relations:

Flesh− eating Spec Animal

P lant− eating Spec Animal

Carnivore Spec Flesh− eating

Omnivore Spec Flesh− eating

Omnivore Spec Plant− eating

Herbivore Spec Plant− eating

Figure 2.7: Specialization hierarchy example.

Specialization relations are represented as arrows and the top element is the atomictype Animal. As result, the pater familias of Carnivore is u(Carnivore) = Animal.Intuitively, a sub-typing relation between a subtype and a super-type means that theinstances of a subtype are also instances of the top element of the hierarchy. For example,the instances of Carnivore are also instances of Animal. This implies that a subtypeis used only when a fact type is exclusively attached to this subtype and not to thesuper-type. For example, the hierarchy shown in Figure2.7 could be extended by addingthe fact type eats, which records the kind of plants that Plant − eating animals eat.In this fact type, the subtype Plant − eating could be used instead of Animal as it isshown in Figure2.8.

Page 22: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 11

Figure 2.8: Specialization hierarchy with connected subtype.

2.1.4 Generalisation

Turning to the concept of generalization, we should first make clear that generalizationdoes not constitute the inverse of specialization as its name suggests. Generalizationprovides a mechanism that allows for the creation of new object types by uniting exist-ing object types[20, 25, 27]. In generalization, properties are inherited upward instead ofdownward that is the case in specialization, and as result the identification of a general-ized object type derives from the identification of its constituent object types(specifiers).This leads to the conclusion that only non-label types can be generalized object types.As result, generalization can be defined as a partial order Gen⊆ ExE , with the ab-breviation aGenb being interpreted as: a is a generalization of b, or b is a specifier ofa[20, 25, 27, 28]. An apt example of generalisation is originally presented in[28], and itis illustrated in Figure2.9. In this example, object types Car and House are general-ized into the object type Product, which inherits its identification by the correspondingspecifiers.

Figure 2.9: Generalization example.

2.1.5 Power Types(or Set Types)

The concept of power typing in PSM is equivalent with the concept of powersets informal set theory[20, 25, 27, 28]. The instances of power types are formed by sets ofinstances of their element types. As result, a power type is identified by its elementtype. The element type of a power type is given by the function Elt:G → O. As itis also the case in specialization and generalization, only non-label types can occur aselement types in power typing, because of the separation between abstract and concreteobject types. A typical example of the power typing concept is shown in Figure2.10.The instances of the power type Convoy consist of sets of instances of the element type

Page 23: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 12

Ship, Elt(Convoy)=Ship. The fact type ∈Convoy= ∈pConvoy,∈eConvoy is implied bythe concept of power typing and is not needed to be drawn, except if it is subject toconstraints.

Figure 2.10: Power type example.

2.1.6 Sequence Types

Sequence typing is the construct that facilitates the representation of sequences composedof an underlying element type. Similarly to power typing, the corresponding elementtype is given by the function Elt:S → O, and an implied fact type ∈x= ∈sx,∈exexpresses the relation between the instances of a sequence type x and the instancesof its element type Elt(x). Moreover, an implicit fact type @x = @s

x,@ix describes the

position of an element in a sequence, as it is presented in the example of Figure2.11.

Figure 2.11: Sequence type example.

2.1.7 Schema Types

A schema type is an object type with an underlying decomposition, meaning that wecan decompose large schemata into,objectified, sub-schemata[4, 25]. The schema typingconcept provides us with a relation ≺⊆ C × O that facilitates the decomposition of aninformation structure x into y(x ≺ y). For each schema type x and each object type y inits decomposition, an implicit fact type ∈x,y= ∈cx,y,∈dx,y exists, associating a schemaobject with an object from its decomposition. A typical example of schema typing isthe example of the decomposition of Activity Graphs that is presented in[20, 25, 27],andis illustrated in Figure2.12.

2.2 Populations

2.2.1 Introduction to the Universe of Discource(UoD)

The Universe of Discourse(UoD) refers to the part of data coming from the real world andcorrespond with an instantiation or population of a given information structure[25, 26].Several states of the UoD may apply for a particular information structure[26]. In fact,an information structure plays the role of a prescription for applying on real world data.As result, the population PopI of an information structure I = 〈 P, O, F , G, S, C, Base,Spec, Gen, Elt 〉 is an assignment of values to the object types in O, leaded by a number

Page 24: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 13

Figure 2.12: Schema type example.

of axioms that apply on the structure of the information structure. The correspondencebetween an information structure and its population is denoted as IsPop(I, PopI) andis formally defined as the mapping Pop:O −→ ℘(Ω), where Ω denotes the set of the validinstances for an information structure. the population of atomic object types in A isdefined as a set of values, corresponding with a particular object type. The populationof label types in L are values that derive from the associated concrete domain, such asa string or natural number(Dom:L −→ D), while the population of entity types in Eare unstructured values[25]. The rules that populations have to comply with will beintroduced in the next paragraph.

2.2.2 Population rules

To begin with, the population of a label type can be only values that derive from thecorresponding concrete domain:

x ∈ L ⇒ Pop(x) ⊆ Dom(x)

Turning to fact types, the population of a fact type in F is a set of tuples[25]. A tuple t inthe population of a fact type f is defined as the mapping of the participating predicatorsto values of proper type, and is denoted as t:f −→ Ω. This statement is also referred toas the Conformity Rule:

∀f∈F∀t(f)∀p∈f [t(p) ∈ Pop(Base(p))]

The sub-typing hierarchy design is reflected in populations by respecting the Specializa-tion Rule, which requires that the population of a super-type contains the population ofa corresponding subtype, and vice versa:

xSpecy ⇒ Pop(x) ⊆ Pop(y)

Generalization hierarchy is expressed by the Generalization rule, which entails that thepopulation of a generalized object type is composed by the populations of its specifiers:

Page 25: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 14

gen(x)⇒ Pop(x) =⋃

xGeny Pop(y)

The rule that indicates that only type related abstract object types might have commoninstances is the Strong Typing rule:

x, y /∈ L ∧ x 6∼ y ⇒ Pop(x) ∩ Pop(y) = ∅

The Power Type rule indicates that the population instances of a power type are sets ofinstances coming from the population of the corresponding element type:

x ∈ G ∧ y ∈ Pop(x)⇒ y ∈ ℘(Pop(Elt(x)))− ∅

The population of the implicit fact type ∈x= ∈px,∈ex is described by the Power Baserule:

x ∈ G ⇒ Pop(∈x) = ∈px: u,∈ex: v | u ∈ Pop(x) ∧ v ∈ u

The population of sequence types is derived from the Sequence Type rule:

x ∈ S ∧ y ∈ Pop(x)⇒ y ∈ Pop(Elt(x))+

In turn, the fact types ∈x= ∈sx,∈ex and @x = @sx,@

ix, which have been introduced

in2.1.6, are described by the Sequence Decomposition rules:

x ∈ S ⇒ Pop(∈x) = ∈sx: u,∈ex: v | u ∈ Pop(x) ∧ ∃i∈I [u[i] = v]x ∈ S ⇒ Pop(∈x) = ∈sx: u,∈ex: v | u ∈ Pop(x) ∧ u(∈sx)[v] = u(∈ex)

The last concept that have been introduced in Section2.1 is that of schema types. Thepopulation of a schema type is indicated by the Decomposition rule:

x ∈ C ∧ y ∈ Pop(x)⇒ IsPop(Ix, y)

The population of the fact type ∈x,y= ∈cx,y,∈dx,y is described in Decompositor rule:

x ≺ y ⇒ Pop(∈x,y) = ∈cx,y: u,∈dx,y: v | u ∈ Pop(x) ∧ u ∈ u(y)

2.3 Integrity Constraints

The populations of information structures are usually confined by the so-called staticintegrity constraints. As result, the population of a schema should satisfy the require-ments expressed by both the information structure and the constraints. According to[4],several static integrity constraints exist, and can be distinguished as: uniqueness, totalrole, occurrence frequency, exclusion, membership, subset, set equality and enumerationconstrains. However, the most important constraints are the uniqueness constraints andthe total role constraints, since they are used for identification, which is treated in thenext section(2.4).

Page 26: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 2. Conceptual Models 15

2.4 Identification

In this section, identification will be briefly introduced. A distinction can be made be-tween weak identification and structural identification, which are thoroughly discussedin[4, 22, 25]. Informally, a population is called weakly identified if each instance of anobject type can be uniquely distinguished by its combination of properties. Labels typesare elementary data types, and are identified by themselves, as a consequence of theirdirect representation. On the other hand, entities are represented by their properties,and as result, they are identified by the roles they play in facts. In turn, compositeobject types are identified by their components, as they are represented in tuples, whileobject types, such as power types etc., are identified by their elements.

The role of structural identification is to guarantee that any valid populations are weaklyidentified. This is obtained by using the facilities provided by the information structurein such way that any object could be uniquely described by a set of properties. By usinguniqueness and total role constraints we can assign identifying properties to objectsthrough bridge types, yielding unique standard name for each object instance.

Page 27: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3

The Graph Data Model

In this chapter, we will introduce the Neo4j graph database model, based on the infor-mation provided in[16, 29, 30]. Firstly, we will give an overview of graph databases, byintroducing basic concepts and constructs of the property graph model. Moreover, wewill discuss the advantages of graph databases in comparison with traditional relationaldatabase systems. Then, we will move forward by introducing the Cypher graph querylanguage, and discussing modeling principles and best practices for modeling data forthe Neo4j graph database.

3.1 Graph Databases

As graph is defined a collection of vertices and edges. In the context of databases,vertices are usually referred to as nodes, and edges as relationships. In a less formalmanner, a graph database model is a set of entities connected by one or more kinds ofrelationships. Modeling data for graph databases is a relatively simple process becauseof the high level of connectedness that graph databases provide, which shrinks the gapbetween the logical and physical model. Unlike to relational data modeling that requirethe use of foreign keys in order to connect entities, graph databases are very expressive,and they reduce the impedance mismatch between analysis and implementation thathas been a hassle in relational database implementations[16]. Another important traitof graph data models is that they simplify the communication of the kinds of questionsthat we want to pose in the domain we are modeling.

3.1.1 The Neo4j Graph Database

Several commercial releases of graph database management systems exist nowadays. Forthe purposes of this thesis, we use the Neo4j, one of the most popular and best knowngraph database management systems. Before we dive into the data modeling propertiesof Neo4j, we will represent some features of Neo4j as a graph database managementsystem.

Neo4j is a robust, scalable and high-performance graph database. It is an ACID-compliant databases, which is interpreted to data reliability. The term ACID is the

16

Page 28: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 17

acronym for Atomicity, Consistency, Isolation, Durability, which are the most funda-mental goals for most database management systems[29]:

• Atomicity, means that a transaction in the database must be all or nothing. Ifa part of the transaction fails, then the entire transaction fails, and as result thestate of the database stays unchanged.

• Consistency, means that only valid data may be written to the database. As validdata is defined data that comply with the rules defined by the schema. An interest-ing point here is that Neo4j is a schema-free database, allowing looser consistencyrules, especially at the early stages of the development cycles. However, while wemove forward to the final product a schema is applied, introducing consistency.

• Isolation, means that transactions that are executed concurrently do not impacteach other. This could be further illustrated if we consider the case that a writingtransaction is executed along with a reading transaction. In this case, the readingtransaction have to work with the existing data of the database, without takinginto consideration the changes that the writing transaction will impose in data.

• Durability, means that once a transaction is committed, it cannot disappear evenin cases of failure. This is obtained by keeping appropriate logs.

Another characteristic of the Neo4j graph database is that it is mostly suitable for usewith OLTP, as it is optimized for transactional performance. This means that it providesimmediate response to the incoming queries. However, it does not mean that we cannotuse it for OLAP. In fact, Neo4j seems to perform more efficiently when executing someanalytical tasks in the relational world[29]. Moreover, Neo4j supports scalability, highavailability and fault tolerance by providing features for clustering, in order to deal withthe OLTP workload. More specifically, it provides a master-slave clustering architecture,as it is shown in Figure3.1. This clustering solution enables horizontal scaling, whichenhances the performance of the system. Last but not least, fault tolerance is enabledby providing facilities to create replicas of a master database.

Figure 3.1: Neo4j clustering architecture

3.1.2 The Labeled Property Graph Model

There are several graph models in graph theory, including property graphs, hypergraphsand triples. The Neo4j graph database uses the labeled property graph model for the sakeof flexibility and versatility.[29]. This graph data model provides us with the followingstructural components:

Page 29: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 18

• Nodes, which are used for the representation of entity information. We can assim-ilate nodes with documents that store properties in a key-value pair pattern.

• Relationships, which are used for connecting nodes, and providing structure in thegraph. A relationship consists of a single name, a start-node and an end-node, anda direction. The combination of a relationship’s direction and its name providesemantic clarity to the structure of the nodes.

• Properties, which have the form of name-value pairs. Properties can be addedin both nodes and relationships. Adding properties in nodes is just like addingattributes in a relational model, while adding properties in relationships is usedfor adding metadata and semantics to them, and for introducing constraints atruntime queries.

• Labels, which are used for assigning roles or types to nodes. In this way, we areenabled to group nodes into sets, making queries more efficient and easy to write.We can add any number of labels to a node, assigning different roles to it.

A simple graph model, consisting of three labeled nodes and two relationships and theircorresponding properties is represented in Figure3.2.

Figure 3.2: A simple graph model example.

3.1.3 A Comparison to Relational Data Model

In this section we will briefly discuss the main differences between traditional relationaldatabases and the graph databases, which renders the later more appropriate for con-nected data. To facilitate this comparison, we will use an example, which is originallyintroduced in [16, 29] and is illustrated in Figure3.3.

In the relational approach, substantial complexity is introduced because of the occur-rence of foreign keys and join tables. In the normalized approach, we introduce foreignkey constraints in order to facilitate one-to-many relationships, and join tables in orderto facilitate many-to-many relationships. As result, while the data scales up, and theinformation structure becomes more complex, the performance of the relational modelbecomes burdened with expensive joins, sparse tables and nullable columns checking[16].

Page 30: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 19

In the example of Figure3.3, the join tables AppDatabase and UserApp are introduced toconnect applications to database servers and users to applications respectively, and theyintroduce additional complexity. The denormalized approach describes an alternativethat reduces the performance strain imposed by table joining by adding more columnsin a particular table in order to inline data instead of creating join tables. However,denormalization involves data redundancy, which is not a trivial issue, especially whenthe volume of data increases exponentially. Moreover, this transition between normal-ized and denormalized databases along with structural migrations that may occur in theentire lifetime of a system entail data integrity jeopardy.

Figure 3.3: Relational database for the data center domain.

On the other hand, graph databases are created to serve connectedness. They are closelyaligned with the domain, while it supports evolution of the data model without compro-mising integrity and reducing performance. In graph databases, we only need to ensurethat nodes are assigned appropriate labels and properties, as well as they are connectedwith appropriate relationships so that the implemented database complies with the cor-responding domain. For our running example, the graph model to be implemented isshown in Figure3.4. No join tables, normalization or denormalization are needed here.What worth mentioning here is that the nodes use multiple labels, which enables us totarget particular groups of nodes with our queries.

To sum up, graph databases outperform relational databases at query performance andschema evolution in cases of highly interconnected data or regularly updating schemata.

3.2 Data Modeling in Neo4j

In this section, we will introduce some generic rules that we should take into considera-tion when we are modeling data for the Neo5j graph database, according to[16, 29] andwhich will help us in the discussion of later chapters when we will introduce the modeltransformation rules. Furthermore, we will introduce the Cypher query language, whichis used by the Neo4j graph database management system.

Page 31: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 20

Figure 3.4: Graph database for the data center domain.

3.2.1 Modeling Best Practices

To begin with, we will introduce some of the best practices. As it usually the case whenwe design any database, we need to take into consideration the query ability of themodel. This means that there is no perfect way to model data, rather we are led by thequestions that we want to ask of the data, and we need to adopt specific trade-offs. Inorder to obtain query-ability of the model, we should align relationships with use cases,meaning that we should introduce different relationships between nodes for differentuse cases. The strategy that helps as align relationships with use cases is using propernaming of relationships. The relationship type names that we use should be as morespecific as possible. A special case of relationship that frequently occurs when we wantto associate more than one entities or concepts. These relationships are called n-aryrelationships. At the modeling process, when we come up with n-ary relationships, thebest thing to do is to introduce an additional node to represent this relationship.

Another best practice to use with graph models is to granulate the nodes as much aspossible. This technique is similar to normalization that is used in relational models.Having in mind that normalization is relatively chip in graph modeling, we prefer to cre-ate smaller and smaller node structures. However, granulation of nodes is not a panacea.We should always keep in mind that we want to obtain query efficiency. Lets considerthe example of Figure3.5. In order to determine which is the best solution for thisexample, we should consider whether we will evaluate the alcohol percentage at querytime or not. If we will do so, it would be more efficient to choose the granulated version,otherwise we could keep the ”fatter” version for simplicity purposes. As an extensionof the granulate pattern, we can use in-graph indexes when appropriate. Indexes coulduseful in cases, such as range queries, time series etc.

3.2.2 Modeling Pitfalls

As it is always the case in data modeling, apart from best practices there are alsosome pitfalls that we should detect and avoid in graph database modeling. To begin

Page 32: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 21

Figure 3.5: Granulated and not granulated structure example.

with, we should always keep in mind that graphs are all about connections betweennodes. Relationship types provide graphs with structure and query power. As result,unconnected graphs constitute an antipattern in graph modeling. Unconnected graphsare considered those which consist of unconnected nodes as it is shown in Figure3.6.On the other hand, dense node pattern constitutes a modeling pitfall as well. As densenodes are identified the nodes that are connected with many parts of the graph. Suchnodes can influence the performance of query traversals because the graph databasemanagement system needs to evaluate all the relationships that are connected to aparticular node in order to determine which path to follow when executing a traversal.Moreover, from a modeling and maintenance perspective, dense nodes are relativelycomplex to handle, because it is difficult to distinguish the concepts and constraintsexpressed by the connected relationships. Several strategies exist to treat the densityissue from a technical perspective, as well as, from a modeling perspective. The technicalapproach involves hardware and file system configurations that will not concern us. Onthe other hand, the modeling perspective is of greater interest and involves potentialmodel adjustments in the way that we represent dense nodes. Such adjustment strategywould be to spread the dense relationships across meta-nodes that are connected to thedense node.

Figure 3.6: Unconnected graph.

Page 33: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 22

Another common pitfall, which is related to the concept of granulated nodes that we havediscussed in 3.2.1, is that of using rich properties in nodes. This pattern contradicts theconcept of the granular model, and the treatment for this antipattern is expressed by thegranular model itself. As an extension of the rich property antipattern, we introduce theantipattern of nodes representing multiple concepts. In this case, nodes have multipleproperties that represent different concerns or concepts. This could be further illustratedif we consider the example of Figure3.7. In this model, the country concept, the languageconcept and the currency concept are mingled together in a single node, which is a redflag in terms of query efficiency and model maintainability. Instead, we split off theconcepts by introducing separate node structures as it is shown in Figure3.8.

Figure 3.7: Node representing multiple concepts.

Figure 3.8: Splitting node with multiple concepts.

Page 34: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 23

3.2.3 An Introduction to Cypher query Language

In this section,we will introduce the Cypher query language which is used in Neo4jgraph database. This will enable us to better understanding and implementing graphdata models. Cypher is a powerful language that provides us with an easy way to access,query and update a graph store. It is a declarative language which is designed to accom-modate the needs of both developers and operations professionals. Cypher focuses onmaking the retrieval of information from a graph simpler, not on how this informationis obtained which is usually the case with imperative and scripting languages.

The structure of the Cypher language is build up by various distinct clauses similarlyto the structure of the SQL. There are several clauses that enables us to create, updateand read from a graph. The following short examples will help the better understandingof the Cypher clauses and syntax. Starting with the basic syntax of Cypher, nodes arerepresented as a pair of parentheses, undirected relationships are represented as pairs ofdashed (− −) and directed relationships are represented as a pair of dashes accompaniedwith an arrowhead stating the corresponding direction (< − −, − − >). Identifiers,properties and label information are included directly between the parentheses in the caseof nodes, while a pair of brackets is used in the case of relationships(e.g. -[identifier:labelproperty]). Combining these two main constructs we can express patterns or facts. Wecan take a look at the example of Figure3.9 in order to discuss these basic syntax inmore detail. In this example, we have three nodes that can be prescribed in cypher as:

• (tom:Person name:’Tom Hanks’ born:’1956’)

• (robert:Person name:’Robert Zemckis’ born:’1956’)

• (forrest gump:Movie title:’Forrest Gump’ released:’1994’)

We also have the relationships ACTED IN and DIRECTED which connect the nodesand are prescribed in cypher as −[: ACTED IN ]− > and −[: DIRECTED]− > re-spectively. Combining these constructs, we result in the patterns(or facts) that areillustrated in the given example:

• (tom:Person name:’Tom Hanks’ born:’1956’)−[: ACTED IN ]− >(forrest gump:Movietitle:’Forrest Gump’ released:’1994’)

• (robert:Person name:’Robert Zemckis’ born:’1956’)−[: DIRECTED]− >(forrest gump:Movietitle:’Forrest Gump’ released:’1994’)

What we have defined so far is just the basic syntax of cypher. Now we will move on,and we will describe some clauses that are used in order to create and query the graph.We will start by looking into the CREATE clause. We use the CREATE clause alongwith patterns in order to create data. Moreover, we can return the data that we createby using the RETURN clause along with the identifiers that correspond to the patternsthat we create. In our example, we create and return the given structure by using thefollowing clauses:

CREATE (p1:Person name:’Tom Hanks’ born:’1956’)−[r1 : ACTED IN ]− >(m:Movietitle:’Forrest Gump’ released:’1994’)

Page 35: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 24

Figure 3.9: A simple graph model.

CREATE (p2:Person name:’Robert Zemckis’ born:’1956’)−[r2 : DIRECTED]− >(m)RETURN p1, r1, p2, r2, m

After creating the data, our next concern is to query this data and get the results thatwe want. Cypher provides us with appropriate match patterns and filtering facilities.More specifically, we use the MATCH statement to describe what we are searching for.A statement like this will return a row for each successful pattern match. In the runningexample, if we want to find all the nodes labelled as ’Movie’, it is enough to ask ourdatabase the following cypher query:

MATCH (m:Movie)RETURN m

We can also search for a specific person named ’Tom Hanks’, using this statement:

MATCH (p:Person name:’Tom Hanks’)RETURN p

In this example, what we actually do is to filter the nodes labelled as ’Person’, by defininga very specific match pattern. Nevertheless, the same result could be obtained by usinga more generic match that would look for all the nodes that are labelled as ’Person’ andthen we would use the WHERE statement in order to narrow the results to those withthe property name:’Tom Hanks’.

MATCH (p:Person)WHERE p.name= ’Tom Hanks’RETURN p

The CREATE and MATCH clauses constitute the most fundamental ones in cypherlanguage, however, there are many more important statements that can be used inorder to manipulate graph databases, and which help us to prescribe several modeling

Page 36: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 3. Data Models 25

concepts. Some of these statements will be dealt with in the following chapters whenwe will describe the model transformations. For the enthusiasts of Neo4j, an extensivedescription of the cypher language is available by the Neo4j manual1.

1The Neo4j Manual v2.3.0, 2015, http://neo4j.com/docs/stable/index.html

Page 37: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4

Data Model Transformation

In this chapter, we will introduce the transformation framework for the ORM to Graphmodel transformation. More specifically, we will first treat each construct of the ORMmodel, as they have been described in 2, separately. Then we will bind the transforma-tion pieces together in order to define a transformation framework that will enable usto transform entire conceptual schemes. In our approach, we will adopt a strategy thatis oriented to avoid redundancies as much as possible. However, avoiding redundanciesis not always possible because it might lead to violation of concepts.

4.1 Transformation Framework

The general transformation framework that we will adopt is based on using the iden-tification of the different object types as an intermediate design phase. This meansthat first, we will transform object types to their identification and then we will processthe fact types and will apply the constraints. However, the semantics of conceptualstructures are defined in terms of populations, which means that populations must betransformed too when we implement the graph model.

4.2 Transformation Rules

4.2.1 Simple Entity and Fact Types

We will start with the transformation of simple entity types. Simple entity types arerepresented in the graph model as nodes, which contain the properties that uniquelydefine the instances of that entity type. As result, in order to implement an entitytype, we should first transform it into its identification. Then, in order to producethe corresponding graph model we create nodes labelled with the entity type name, andhaving as properties the identifiers of this entity type. After that, we treat the fact typesthat involve this particular entity type, taking into consideration the constraints to beapplied, which determine the way that fact types are treated. Concerning the uniquenessconstraint there are two cases that can be defined. The compound uniqueness constraintthat constraints more than one role, and the functional roles which refer to uniqueness

26

Page 38: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 27

constraints which are attached to a single role. Moreover, we can define total functionalroles and optional functional roles which depend on the existence of a total role constraintattached to a unique role. When transforming a construct that is compound of simpleentity types and some kind of the fact types that we have just described, we define thefollowing cases:

• If we have a compound uniqueness constraint, we just create a simple relationshipbetween the nodes.

• If we have a functional role, we create intermediate meta-nodes to express theconstraints that apply on the fact type.

In order to better explain this mapping, we will use the example of Figure 4.1. In thisexample, we can see four entity types A, B, C and D which participate in fact typesr1, r2, r3. In order to transform this simple scheme into a graph model we need firstto replace entity types with their identification. In Figure 4.2, we can see a sampleof this transformation step which is applied on the entity type A. Acode can uniquelyidentify the instances of the entity type A, so we should guarantee that the propertyAcode is unique in the nodes labelled as A. Accordingly, we replace all the entity typeswith their identification and then we can move on by treating the fact types. We willstart with the fact type r3 which has a compound uniqueness constraint on it. This kindof constraints express a many to many relationship between the entities that play rolesin it. As result, we can transform this kind of fact types by producing labelled nodesfor each instance of the entity types that participate in the fact type r3 and connectingthem with a relationship with label r3 as it is prescribed in Figure 4.3. Next, we willtreat the fact type r2, which constitutes an optional functional role. This means thatwe have an one to many association between the corresponding entity types A and C.In this case, the first step remains the same as we first have to transform the involvingentity types into their identification. What changes here is the way that we treat thefact type r2. This time, we cannot express the fact type r2 by just adding relationshipsin the graph model be cause we might cause a violation of the uniqueness constraintby using this strategy. In order to better understand this reasoning, we will extend theexample of Figure 4.1 by adding some sample populations, resulting in the example ofFigure 4.4. Transforming the population of fact type r2 according to the strategy thatwe follow in the case of r3 might cause a violation of the uniqueness constraint as itis shown in Figure 4.5, because there is no facility in neo4j allowing us to constraintthe occurrence of nodes in particular relationships. Practically, this means that despitethe fact that we can create a uniqueness constraint on nodes in neo4j, this does notmean that they will appear only once as part of a particular labelled relationship. Thisweakness of neo4j derives from the fact that, unlike to relational databases where theinstances of an entity type are defined separately for each fact type in which the entitytype participates, in graph databases the instances of the entity types are defined onlyonce and then relationships are used in order to add structure. For that reason, we needto adopt the strategy that we discussed above, by inserting intermediate meta-nodesto express the participation of particular nodes in a relationship. By doing that, wecan apply a uniqueness constraint on the property that refers to the group of nodesthat correspond to the constrained entity type of the conceptual level. This approachwould lead to the graph model scheme that is illustrated in Figure 4.6. In order toguarantee that no two instances of A are connected with some instance of C, we applya uniqueness constraint on the property Acode of the r2 meta-node. Similarly, we treat

Page 39: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 28

the case of total functional roles. This means that in the running example, we needto insert intermediate meta-nodes in order to transform the fact type r1 that connectsthe entity types A and B. Then, we can apply the total and uniqueness constraints onthe Acode property of the meta-node r1 as we can see in the graph model scheme ofFigure 4.7.

Figure 4.1: Simple entity types transformation.

Figure 4.2: Simple entity types identification.

Figure 4.3: Transformation of fact type with compound constraint.

Figure 4.4: Simple entity types transformation.

Page 40: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 29

Figure 4.5: Population violation in optional functional role.

Figure 4.6: Optional functional role in graph model.

Figure 4.7: Total functional role in graph model.

4.2.1.1 Bridge Types

Bridge types constitute the simplest form of binary fact types since they associate con-crete entity types with abstract label types. Abstract label types are usually part of theidentification or they act like properties of the concrete entity types. This characteris-tic of bridge types makes them easy to transform into the graph model since when wespecify the identification of the involved entity types, the corresponding label types areeither absorbed by the entity type directly at the ORM level, or act like properties andas such, they can be absorbed by the graph nodes in the graph model level. The casethat the label type identifies the corresponding entity type is illustrated in the exampleof Figure 4.8. In this example, entity A is identified by the label B and as result ourrunning example is formulated as it is shown in Figure 4.9 and our target graph modelthat is produced is that of Figure 4.10. In the case that label B is not part of theidentification of entity type A, then it is absorbed by the nodes which are labelled as Aat the graph model level.

Page 41: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 30

Figure 4.8: ORM bridge type example.

Figure 4.9: Bridge type identification.

Figure 4.10: Transformation of bridge type example.

4.2.1.2 N-aries

After discussing the basic cases of binary fact type transformations, we are going tointroduce the case of higher order fact types(n-aries). For that purpose, we will discussthe ternary fact type presented in the example of Figure 4.11. Again the strategy thatwe follow is to replace the entity types with their identification and then to treat thefact type. In this case, the ternary fact type is treated in a way similar to the waythat we have treated the functional role cases earlier, meaning that we will use meta-nodes. This approach is required when we want to express n-ary relationships in graphmodelling, because it is the only way to distinct the triples. As result, we need tointegrate a meta-node which is labelled as r and has as properties the identifiers ofthe entity types that are involved. In the running example, the meta-nodes r that areintroduced have as properties the specifiers of A,B and C entity types, meaning Acode,Bcode and Ccode. The combination of Acode and Bcode is the specifier for the facttype r and we should apply a compound uniqueness constraint when we transform thisrelation to the graph model, as it is shown in Figure 4.12. However, technically, it is notpossible to define a uniqueness constraint that applies on a combination of propertiesin neo4j. This can only be obtained by introducing a new property which concatenatesthe constrained properties. Then, we can use this single property in order to handlecompound uniqueness constraints. This strategy will be also used in other cases that weneed to apply a uniqueness constraint on multiple properties concurrently. As result,the graph model for the ternary r is formulated as it is shown in Figure 4.13.

Page 42: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 31

Figure 4.11: Ternary fact type transformation.

Figure 4.12: Ternary fact type in graph model.

Figure 4.13: Ternary fact type in graph model.

4.2.1.3 Complex Identification

In complex identification, an entity type is identified by the combination of two ormore functional roles. In order to express that combined identification in the graphmodel, we need to use a mechanism similar to the one used in the ternary fact typerepresentation, meaning that we will use a list of values to uniquely identify the nodeinstances in the graph model. The transformation of the complex identification can befurther illustrated using the example of Figure 4.14. I this example, entity type A isidentified by the unique combination of Cprop and Bcode as it is shown in Figure 4.15.Now that we have specified the unique identifier for A entity, we can proceed by definingthe transformation rule that prescribes the creation for the corresponding graph modelrepresentation. The instances of A entity will be transformed as separate nodes in thegraph model, labelled with the entity type’s name A. Then, the A labelled nodes will beidentified by a code which consists of a list of values that derives from the combinationof Cprop and Bcode. By applying this transformation rule on our example, the resultgraph representation is formulated as it is shown in Figure 4.16.

4.2.2 Objectifications

The general rule when we deal with objectification transformations is to treat themas simple entity types. More specifically, when we transform objectifications, we first

Page 43: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 32

Figure 4.14: ORM complex identification example.

Figure 4.15: ORM complex identification example.

Figure 4.16: Transformation of complex identification example.

need to define their identification and then to map the instances of the objectificationtype into unique labelled nodes in the graph the model. After doing that, we moveon by treating the fact types that involve our objectifications, according to 4.2.1. InFigure 4.17, we can see a simple example of an objectified fact type. The identificationof the objectified fact type is described as the combination of Acode, Bcode values as itis prescribed in Figure 4.18. However this compound identification can not be applieddirectly in the graph model, since it does not provide us with appropriate mechanismsin order to apply compound uniqueness constraints on nodes. Instead, we need to em-ploy the same trick that we used in complex identification transformation. As result,the instances of the objectified fact type can be transformed into labelled nodes in thegraph model, which are uniquely identified by the combination of the Acode, Bcodeproperties inside them, as it is shown in Figure 4.19. As we can see, in this case weencounter the same issue as in the case of the ternary fact types 4.2.1, with regard tothe compound uniqueness constraint and for that reason we apply a property concate-nation in order to handle the situation. The way in which we can apply the propertyvalue concatenation in the graph model reies on the used data types. We can either usestring concatenation or to introduce an array data type for the concatenated propertyvalues. The former approach requires the concatenated property values to be of stringtype, while the later approach provides us with more flexibility, since we can combineproperty values of different data types. Furthermore, this figure reminds us the graphmodels that derived from the functional role transformations in 4.2.1. In terms of lan-guage comparison, this means that in graph modelling language we treat ORM facttypes as objectifications and vice versa. Furthermore, objectifications affect the way wetreat compound uniqueness constraints that we discussed in 4.2.1. If a fact type with

Page 44: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 33

compound uniqueness constraint is objectified, then it implies that intermediate meta-nodes need to be implemented in the graph model, in order to facilitate the conceptof objectification. This can be further illustrated if we compare the transformation ofthe non-objectified and the objectified fact type with compound uniqueness constraintof Figure 4.3 and Figure 4.19 respectively. In the non-objectified case, we represent thefact type as a simple relationship in the graph model, while in the objectified case, weintroduce an intermediate meta-node. In the next step, we deal with the fact types thatinvolve our objectification and we end up with the graph model of Figure 4.20. What isinteresting here is that when an objectification is involved in a fact type as a functionalrole, the newly inserted meta-node reminds us the way we treat ternary fact types withthe difference that in this case we have an extra node to represent our objectificationas a separate entity. This seems reasonable, as the given example in ORM equals toan ORM ternary fact type. As a consequence, we could merge the nodes labelled as r1and r2 in this case, using a single node with multiple labels as it shown in Figure 4.21.This would help us to minimize redundancies, however, it would lead us to mix severalconcepts in a single node. Since our approach is focusing on efficiency in terms of lan-guage transformation and not in terms of redundancy elimination, we prefer to adoptthe granulated approach.

Figure 4.17: Objectification example in ORM.

Figure 4.18: Objectification identification example.

Page 45: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 34

Figure 4.19: Objectification in the graph model.

Figure 4.20: Objectification example transformation into graph model.

Figure 4.21: Objectification example transformation with minimal redundancies.

4.2.3 Specializations

In the specialization case the properties of the specialized object type are inheriteddownwards as we already mentioned in 2.1.3. As a consequence, the identification of thespecialized object types is also inherited by the corresponding super-type. This char-acteristic of specialized objects types leads us when we transform specializations fromORM to the graph model. More specifically, when we transform specializations we needto take into consideration that the specialized object types constitute distinct subsets

Page 46: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 35

of the parent object type and, as a result, their specifier instances should be mutuallyexclusive.

Turning to the transformation process that we study in this paper, when we deal withspecializations, we can choose between three main approaches; transformation by parti-tion, by separation and by absorption. Before we go deeper to the explanation of thesethree approaches, it would be useful to introduce a working example which will help usto comprehend the difference among them. In Figure 4.22, we can see a simple informa-tion structure, consisting of the object type A, two subtypes B and C, along with thecorresponding properties. In the transformation by partition, each of the subtypes Band C is combined with the properties of the super-type A, resulting in a graph modelconsisting of two groups of multi-labelled nodes [A,B] and [A,C] respectively, as it isshown in Figure 4.23. Using this approach, we can guarantee that no data integrityviolation will occur during the transformation of the conceptual model into the internalgraph model, because the presence of the common label A enables us to apply a unique-ness constraint on the common specifier A-ID in the graph model. This means thatall the node instances will be unique, which implicitly guarantees that the instances ofA-ID in the nodes labelled as B and C are mutually exclusive. The next approach thatwe will discuss is the transformation by separation. This approach seems to be the moresuitable for the graph model as we use separate nodes to represent the objects that areinvolved in the specialization relation. More specifically, in our example, we would rep-resent A, B and C in separate labelled nodes. However, this approach entails the dangerof violation of the specialization concept because the graph model does not provides uswith appropriate mechanisms in order to enable us to guarantee that the case that botha B and a C instance will be connected to a single A instance. This approach requiresthat B and C instances are mutually exclusive and this constraint cannot be expresseddirectly in the graph model. Instead, we can infer it by using a common label and thenapply a uniqueness constraint on their common identifier. For our running example, thiswould lead us to a graph model like this in Figure 4.24. The third possible approachis that of the transformation by absorption. This approach is based on the ground ofmerging all the subtypes with their power-type. In the graph model, this would resultin nodes that inherit their label by the super-type and have as properties, the propertiesof all the involved objects. In the relational model, this would be an issue because therewould be many null values, but in the graph model this does not apply because of theflexibility that is provided concerning the properties in the nodes. In the graph model,we specify the properties of a node during the creation of the node, and as result wedefine the properties that we need each time. As consequence, the transformation byabsorption is quite similar to the transformation by partition approach, if we considerthat in the created nodes we would integrate the properties of the super-type alongwith the properties of the active sub-type each time. This could be further illustratedif we consider our running example. Using the transformation by absorption, we wouldresult with nodes labelled as A and have as properties the Aproperties along with eitherBproperties or Cproperties. This is similar to the case of Figure 4.23 with the differencethat B and C labels are omitted as we can see in Figure 4.25.

Page 47: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 36

Figure 4.22: Specialization transformation example.

Figure 4.23: Specialization transformation by partition.

Figure 4.24: Specialization transformation by separation.

Figure 4.25: Specialization transformation by absorption.

4.2.4 Generalizations

The generalization case defers from the specialization case in the manner that facts areinherited upwards 2.1.4, meaning that the identification of the generalization is inheritedby the corresponding generalized object type.

Figure 4.26: Generalization transformation example.

Similarly to the specialization case,there are several ways to transform generalizations.First we will treat transformations by absorption. This approach is not appropriate for

Page 48: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 37

direct transformation from ORM model to graph model since identification of gener-alized entity type derives from different specifiers. This fact hinders us from applyinguniqueness and totality on inherited specifiers, since always one of the two will get anull value. In our running example, our generalized entity A is identified either fromB-id or from C-id, which means that we cannot utilise both specifiers in the graphnodes created by the transformation process, and as result the graph representationof Figure 4.27 is invalid in terms of language consistency. This happens, because wecan guarantee uniqueness on B-id and C-id but not totality since one of them will bealways null. In order to enable us to apply absorption approach we need to use sometrick in order to switch our identification from our generalized entities to a global entity.However this would turn out to be handled as a specialization. After applying a globalspecifier to the generalization entity of our example, the ORM model for transformationwill be adjusted as it is shown in Figure 4.28. This approach is not incorrect, but also itis not desirable because it adds more complexity to our transformation procedure, andas result, we will choose not to dig more into it.

Figure 4.27: Generalization transformation by absorption.

Figure 4.28: Adjusted generalization orm model.

Turning to partition approach, we can transform our ORM model by creating to differenttypes or labelled nodes, one for each uniquely identified entity B and C. Then, we willalso add A-properties in B and C labelled nodes respectively. In this way we will obtainthe upward inheritance that is the basic characteristic of generalization. By following thisapproach, our example ORM model will be transformed as it shown in the Figure 4.29.As we can see in the picture, the generalized entity A is integrated into entities Band C and inherits their identification specifiers respectively. It is easily understoodthat transformation by partitioning entails some sort of redundancy for the generalizedobject type, since its instances may be repeated in several instances of the specifierentities. In our running example this is translated as redundancy of A properties thatmay derive from repeated instances of A properties in instances of B and/or C entities.

Separation approach seems to be the most well-fitted approach for transformation intograph model. As we already mentioned in Section 3.1.3, the graph model is designed

Page 49: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 38

Figure 4.29: Generalization transformation by partition.

for relations. But the question that raises is how this feature of graph models influencestransformation by separation. In transformation by separation we create a node for eachof our entities labelled with the corresponding entity name. Then we apply appropriateconstraints on the nodes derive from generalized entities so that we guarantee identifi-cation. In our running example, we need to apply appropriate constraints on B-id andC-id in order to guarantee that node instances labelled as B and C respectively areunique and total. Finally, we add B-properties and C-properties as node properties tothe nodes with the corresponding label and also create nodes labelled as A, containingthe A-properties. While in relational model, this would require some sort if identifierto be implemented on A entity so that we can easily refer to A instances when we tryto connect them with other entities, such as B and C, this is not required in the graphmodel, since the connection to A instances is obtained using graph relations. By follow-ing these rules on our running example, we will create the graph model demonstratedin Figure 4.30 In the graph model we do not need to involve any reference key when wewant to relate two entities.

Figure 4.30: Generalization transformation by separation.

4.2.5 Power Types(or Set Types)

Power types require a particular type of constraint in order to be expressed effectivelyin the graph model. More specifically, it requires an existential uniqueness constraintthat is applied on the elements that constitute the sets that populate a power type en-tity.This has been discussed in detail in Section 2.1.5. The power type construct cannotbe expressed directly in the graph model because it does not provide us with appropriatebuild in existential uniqueness constraints and as result we can not guarantee unique-ness of power type instances.We rather need t express existential uniqueness by usingworkarounds like those that have been used in the structures that require compounduniqueness constraints. More specifically, we will use intermediate meta-nodes so thatto be able to apply appropriate constraints on the codes that we will store in the meta-nodes. To illustrate this transformation, we will use the example shown in Figure 4.31.

Page 50: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 39

In this example the committee entity entity entity type constitutes a power type of theperson entity type, meaning that its population consists of unique sets of Person entityinstances. To transform this into a graph representation , we will create Committeelabelled nodes for the Committee entity instances and separate Person labelled nodesfor the Person entity instances. To uniquely identify the sets of persons that constitutea committee, we need to introduce an artificial identifier for the Committee labellednodes. This artificial identifier will be a list of values that compose a reference to a setof Person labelled nodes. Thus, the target graph representation that derives by applyingthis transformation rule will be formulated as it is shown in Figure 4.32.

Figure 4.31: ORM power type example.

Figure 4.32: ORM power type example.

4.2.6 Transformation Rules Summary

In this chapter, we discussed some of the most common transformation rules that we willneed in order to transform most of the real world ORM models into graph databases.We did not analyse data structures such as sequence types and schema types, whichrarely apply to real world use cases. Transformation, rules for these types are out ofscope for this paper, and as result, they can be included in a future paper. To sumup the transformations rules that we defined in this chapter, we provide the followingtransformations table.

Page 51: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 40

Transformation rules

Rule Code ORM Graph

Rule 1

Rule 2

Rule 3

Rule 4

Rule 5

Rule 6

Rule 7

Page 52: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 41

ORM Graph

Rule 8

Rule 9

Rule 10

Rule 11

Rule 12

Rule 13

Table 4.1: Transformation rules

4.3 Transformation Algorithm

In the previous sections we defined the transformation rules that we need to apply onparticular ORM constructs in order to transform them into the corresponding Graphmodel constructs. In order to transform a whole information structure, we need to apply

Page 53: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 4. Data Model Transformation 42

several of those rules. In this section, we will provide the transformation algorithm thatwe need to follow when we need to transform an ORM information structure into aGraph model. In this algorithm, we will use the rule codes from Table 4.1.

g:= Empty graph model;for i in Object Types do

Transform i into its identification; // e.g. see F igure 4.9case type i of

Specialization: Extend g by applying Rule 8 or Rule 9 or Rule 10; // seeSection 4.2.3

Generalization: Extend g by applying Rule 11 or Rule 12; // see Section 4.2.4Objectification: Extend g by applying Rule 7; // see Section 4.2.2Complex Identification: Extend g by applying Rule 6; // see Section 4.2.1.3Power Type: Extend g by applying Rule 13; // see Section 4.2.5Fact Type: Do nothing;

endfor j in Fact Types involved with i do

Transform j into its identification;case type j of

Bridge Type: Modify i by applying Rule 1; // see Section 4.2.1.1Total Functional Role: Extend g by applying Rule 2; // see Section 4.2.1Optional Functional Role: Extend g by applying Rule 3; // see Section 4.2.1Compound Uniqueness: Extend g by applying Rule 4; // see Section 4.2.1N-ary: Extend g by applying Rule 5; // see Section 4.2.1.2

endend for

end for

Page 54: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5

Validation

5.1 Validation Framework

In this chapter we will validate that the transformation framework provided in Chapter 4is an efficient way to transform an ORM model to a Graph database representation, interms of language. In order to obtain this, we will use a method which is based ontransitive logic. More specifically, we will use a well grounded example, which refersto transformation from ORM model to Relational model, and we will try to apply ourtransformation model in order to produce an equivalent Graph model. The model thatwe will use as baseline is the so-called presidential database. In order to prove thatmodels are equivalent, we will examine populations of Relational and Graph modelby executing several queries against Relational and Graph model respectively. If thequery result sets are equivalent, then we can assume that also our Relational and Graphmodels are equivalent. Then, given that our Relational model constitutes an efficienttransformation of the initial ORM model, we could claim that also our Graph model isan efficient representation of the ORM model.

5.2 Example Transformation

5.2.1 Presidential ORM

The ORM model that we will use for the validation of our transformation mechanism isprovided in Figure 5.1. This model provides a good example which is able to representhigh percentage of the real world data structures that information structure designersusually have to deal with. Our next step is to create the meta-model that will aid us toproduce our target graph database. In order to create this meta-model, we will follow adivide and conquer approach, according to which, we will split our initial ORM modelinto smaller areas. In order to decide how we will make the division into smaller pieces,we will rely on the granular structures that we discussed in Chapter 4, and which aresummarized in Table 4.1.

43

Page 55: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 44

Figure 5.1: Presidential ORM model.

5.2.2 Presidential Graph Meta-model

By applying the transformation rules that we defined in Chapter 4 on the differentareas of our ORM model, we will create the graph meta-model that we need in orderto create the statements that produce our graph database. More specifically, we willapply transformation rules for optional and total functional roles, compound uniqueness,ternaries, objectifications and specialisations.

We will start building our graph meta-model by choosing as starting point the special-isation of the entity type Person into entity type President. For that purpose, we willmake use of the transformation by separation rule which means that we will separateentities Person and President in different labelled nodes which are connected with arelationship that implies that the set of President instances is a subset of the Personentity instances.

Then, we will deal with the objectified fact type Marriage, which involves Person andPresident entity types. This means that the identification of the objectified fact typeMarriage is specified by the combination of the identifications of Person and Presidententity types. By following the rule that applies on objectifications, we will create alabelled meta-node for the entity Marriage, and we will connect it with Person andPresident labelled nodes using appropriate relationships.

In the next step, we will deal with optional and total functional roles, compound unique-ness and ternaries. First we need to take into consideration the bridge types which areinvolved in these structures. Bridge types are not presented as separate meta-nodes, but

Page 56: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 45

instead, the corresponding label types are included as properties in the graph labellednodes which are created for the connected entity types. In our ORM model, the labeltypes Year, Age, Nr of years and Nr of children are involved in several bridge typeswith various roles. These labels will not represented as separate nodes in our graphmeta-model, but instead, they will participate as properties in the graph labelled nodesthat represent the corresponding entity types. Turning to the variations of the facttypes of the examined ORM model, we will start with those that involve total func-tional roles. We have four instances of total functional roles which are not bridge types.One of them is the fact type which involves the Person and the Election entity types.By following the corresponding transformation rule, this structure is presented in thegraph meta-model as a labelled meta-node identified by the Election identification, andhaving as property the identification of the Person functional role. Another instance oftotal functional role is the involvement of the Administration entity type in the fact typethat connects it with the Person entity type. Again, according to the transformationrule that applies for total functional roles, we need to introduce a labelled meta-node inour graph meta-model that is identified by the Administration entity identification andhaving as property the identification of the Person entity type. Similarly, we treat thefact types that involve the President entity type as a total functional role attached withthe Party and State entity types respectively. Now we can move on by transformingfact types which involve compound uniqueness. We have only one instance of that typein our source model, and this involves President and Hobby entity types. By followingthe corresponding transformation rule, it is simple to transform this structure into thegraph meta-model by creating labelled meta-nodes for both entity types and connectthem directly using an appropriate relationship. The last structure type that appears inthe presidential ORM model and needs to be treated is the ternary fact type. There isa ternary fact type instance that involves Person, Election and Nr of votes entity types,and a second that involves Person entity in two different functional roles related withthe Administration entity type. In the former case, the ternary represents conceptually,the score that has been obtained by a candidate in a specific election. This ternaryis transformed into the graph meta-model by creating a meta-node labelled as Score,which is identified by the combination of the identifications that specify Person andElection functional role instances. In the later case, the ternary describes the conceptof the vice president of an administration. This ternary is transformed into the graphmeta-model by introducing a meta-node labelled as Vice President, which is identifiedby the combination of the identification of all the involved functional roles. All thesetransformations are merged together to draw up the graph meta-model that is presentedin Figure 5.2, which will be used as matrix to produce our target graph database.

Page 57: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 46

Figure 5.2: Presidential Graph Meta-model.

5.2.3 Cypher Statements

Now that we have defined our graph meta-model, we are ready to proceed with thepreparation of the Cypher statements, which produce our graph database. We willseparate the statements into three categories; first we will produce our data integrityconstraint statements and then we will proceed with the statements that create labellednodes and relationships.

5.2.3.1 Data Integrity Constraints

We will start the building of our graph database by applying appropriate data integrityconstraints on the labelled nodes, based on the meta-model that we created in theprevious section. The Cypher statements to create those constraints are the following:

CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE

CREATE CONSTRAINT ON (n:President) ASSERT n.name IS UNIQUE

CREATE CONSTRAINT ON (n:Vice President) ASSERT n.vp id IS UNIQUE

CREATE CONSTRAINT ON (n:Administration) ASSERT n.admin nr IS UNIQUE

CREATE CONSTRAINT ON (n:Presidency) ASSERT n.admin IS UNIQUE

Page 58: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 47

CREATE CONSTRAINT ON (n:Marriage) ASSERT n.id IS UNIQUE

CREATE CONSTRAINT ON (n:Winner) ASSERT n.election year IS UNIQUE

CREATE CONSTRAINT ON (n:Score) ASSERT n.id IS UNIQUE

CREATE CONSTRAINT ON (n:Election) ASSERT n.election year IS UNIQUE

CREATE CONSTRAINT ON (n:Hobby) ASSERT n.hobby name IS UNIQUE

CREATE CONSTRAINT ON (n:Birth State) ASSERT n.president name IS UNIQUE

CREATE CONSTRAINT ON (n:State) ASSERT n.state name IS UNIQUE

CREATE CONSTRAINT ON (n:Member) ASSERT n.president name IS UNIQUE

CREATE CONSTRAINT ON (n:Party) ASSERT n.party name IS UNIQUE

5.2.3.2 Labelled Nodes

After having all the data integrity constraints in place, we can proceed with the nextstep which is to populate our graph database with labelled node instances based on themet-model that we defined in the previous section. For each of the labelled nodes whichare specified in Figure 5.2, we will need to create an appropriate Cypher statementtemplate, so that we can reuse it and reproduce our target database. The data thatwe will use in order to populate our nodes is available in Appendix A. The statementtemplates that we created in order to load our data in Neo4j graph database are thefollowing:

CREATE (:Person name:row.name)

CREATE(:President name:row.name,birth year:row.birth yr, death age:row.death age,years served:row.years serv)

CREATE(:Vice President vp id:[row.admin nr, row.president name, row.vice pres name])

CREATE (:Administrationnumber:row.admin nr, year inaugurated:row.year inaugurated)

CREATE (:Presidency admin nr:row.admin nr, person name:row.pres name)

CREATE (:Marriageid:[row.president name, row.spouse name], year:row.mar year, pres-ident age:row.president age, spouse age:row.spouse age, nr of children:row.nr of children)

CREATE (:Winner president name:row.president name, election year:row.election year)

Page 59: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 48

CREATE (:Score id:[row.candidate, row.election year], votes:row.votes)

CREATE (:Electionelection year:row.election year)

CREATE (:Hobby hobby name:row.hobby name)

CREATE (:Birth State president name:row.president name, state name:row.state born)

CREATE (:State state name:row.state name, union entered year:row.union entered year)

CREATE (:Member president name:row.president name, party name:row.party name)

CREATE (:Party party name:row.party name)

5.2.3.3 Relationships

The last step that we need to take is to create the relationships which connect the nodesthat we created in the previous step. For each relationship that derives from our meta-model, we will create a Cypher statement templates. Then, based on these templates,we will create the relationships for the data in Appendix A. The statement templatesthat we will use are the following:

MATCH (n:Person name:row.person name)MATCH (m:President name:row.president name, birth year:row.birth yr, death age:row.death age,years served:row.years served)MERGE (m)−[r:ISA]− >(n)

MATCH (n:Person name:row.president name)MATCH (m:Person name:row.vice pres name)MATCH (v:Vice President vp id:[row.admin nr, row.president name, row.vice pres name])MERGE (n)−[r1:president in]− >(v)< −[r2:vice president in]−(m)

MATCH (v:Vice President vp id:[row.admin nr, row.president name, row.vice pres name])MATCH (a:Administrationnumber:row.admin nr)MERGE (a)−[r:was administered by]− >(v)

MATCH (n:Person name:row.president name)MATCH (m:Administration number:row.admin nr)MATCH (p:Presidency admin nr:row.admin nr, person name:row.president name)MERGE (n)−[r1:being president of]− >(p)< −[r2:having as president]−(m)

Page 60: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 49

MATCH (n:Person name:row.spouse name)MATCH (m:President name:row.president name)MATCH (s:Marriageid:[row.president name, row.spouse name])MERGE (n)−[r1:spouse of]− >(s)< −[r2:married with]−(m)

MATCH (n:Person name:row.spouse name)MATCH (m:President name:row.pres name)MERGE (n)−[r : spouse of ]− >(m)

MATCH (n:Person name:row.spouse name)MATCH (m:President name:row.pres name)MERGE (m)−[r : married with]− >(n)

MATCH (m:Person name:row.candidate)MATCH (w:Winner president name:row.candidate, election year:row.election year)MERGE (m)−[r:having won]− >(w)

MATCH (m:Person name:row.candidate)MATCH (s:Score id:[row.candidate,row.election year])MERGE (m)−[r:votes obtained]− >(s)

MATCH (w:Winner election year:row.election year)MATCH (s:Score id:[row.candidate,row.election year])MATCH (e:Electionelection year:row.election year)MERGE (w)< −[r1:being won by]−(e)−[r2:election votes]− >(s)

MATCH (m:President name:row.president name)MATCH (h:Hobby hobby name:row.hobby name)MERGE (m)−[r:having hobby]− >(h)

MATCH (m:President name:row.president name)MATCH (b:Birth State president name:row.president name)MERGE (m)−[r:born in]− >(b)

MATCH (m:President name:row.president name)MATCH (p:Member president name:row.president name)MERGE (m)−[r:member of]− >(p)

MATCH (m:Member party name:row.party name)MATCH (p:Party name:row.party name)MERGE (m)< −[r:has member]−(p)

Page 61: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 50

MATCH (m:Birth State state name:row.state name)MATCH (s:State state name:row.state name)MERGE (m)< −[r:member of]−(s)

5.3 Validation Queries

To validate that our transformation framework, we will need to compare the graphdatabase that we created with the corresponding relational database. To do this, weneed to define some common questions that will be examined against both databases,by using SQL and CQL queries respectively. Then we will compare the returned datasets to prove that the databases are equal. If the databases are proven equal, then wecould claim that our transformation framework is valid. The questions that we will usefor that purpose are originally presented in Chapter 5 of the lecture notes, and they areformulated as follows:

5.3.1 Query 1

Natural Language QueryShow all presidents.

ORCLIST President

SQL QueryTo list all the presidents in SQL, we need to use the SELECT command in order to listall the president names that exist in the president table, which holds the all instancesof the president object type.

SELECT pres name FROM president;

CQL QueryIn the graph model, the population of the object type President is represented by nodeslabelled as President. In order to obtain all presidents in CQL, we will need to use theMATCH command to match all the president nodes, and subsequently, we will use theRETURN command to display the resulting president names.

MATCH (n:President) RETURN n.name;

By executing this CQL statement on Neo4j DBMS, we will get a result set which containsall the president names, and which is equal to the result set that we get from thecorresponding SQL statement execution. We will not quote these result sets, becausethey are long lists of president names, and it is out of scope to provide long lists withtext in this section. Nevertheless, we will provide some data in some of the next queries,where the result sets are more compact.

5.3.2 Query 2

Natural Language QueryWho are the president spouses?

Page 62: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 51

ORCPerson being spouse of President

SQL QueryIn the relational database model, all the instances of the object type Person which playthe role of spouse are presented in the Pres marriage table. In order to get all instancesof the persons playing the role of spouse in SQL, we will need to use the SELECTDISTINCT command against Pres marriage table.

SELECT DISTINCT spouse name FROM pres marriage;

CQL QueryIn the graph model, the procedure that we need to follow in order to get all the Personinstances that play the role of spouse is similar to the procedure used in Query 1(5.3.1),meaning that we will need to match a pattern and then to return the resulting setof values. Particularly, we will use the MATCH command to match the pattern ofany node m labelled as Person which is connected to any node p labelled as Presidentwith the relationship spouse of, and subsequently, we will use the RETURN DISTINCTcommand to display the matched person instances. It is worth mentioning that in thiscase we use RETURN DISTINCT command instead of the RETURN command, in orderto avoid redundant values in our result set.

MATCH (m:Person)−[spouse of ]− >(p:President) RETURN m;

This CQL statement returns a result with all the spouse names in the population of ourgraph model, which is equal to the result set that is returned by the SQL statement.

5.3.3 Query 3

Natural Language QueryWhich presidents have been married?

ORCDISTINCT President having spouse

SQL QueryThis query is similar to Query 2(5.3.2).

SELECT DISTINCT pres name FROM pres marriage;

CQL QueryThe logic behind this query is similar to the logic followed in 5.3.2.

MATCH (p:President)−[married with]− >(m:Person) RETURN DISTINCT p.name;

As it is also the case in Query 1 and Query 2, the result sets of the CQL and thecorresponding SQL statement are equal.

5.3.4 Query 4

Natural Language QueryWho is the spouse of president Johnson?

Page 63: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 52

ORCPerson being spouse of President with Name ’Johnson’

SQL QueryThe language used in the previous two queries is extended in this case by introducing aWHERE clause in order to restrict the values in our result set, and the LIKE operatorin order to express the comparison condition applied on the president name. The SQLquery in this case is formulated as follows:

SELECT spouse name FROM pres marriage WHERE pres name LIKE ’%Johnson%’;

By executing this SQL statement, we will get the names McCardle E and Taylor C Aas spouse names of any presidents named as ’Johnson’.

CQL QueryIn CQL, again the query language is extended with the WHERE clause and the CON-TAINS operator in order to restrict our result set on the president name.

MATCH (m:President)−[married with]− >(p:Person) WHERE m.name CONTAINS’Johnson’ RETURN p

This CQL statement returns the names McCardle E and Taylor C A as spouse namesof any presidents named as ’Johnson’.

5.3.5 Query 5

Natural Language QueryShow the average age at death of deceased presidents.

ORCAVERAGE Age being death age of President

SQL QueryTo find the average death age of all deceases presidents, we need to introduce the AVGaggregate function in our language dictionary. The corresponding SQL statement frothis query is formulated as follows:

SELECT AVG(death age) FROM president;

The average age at death, as it is calculated by the AVG function, is the numeric value68,86 . CQL QueryAggregate functions are also available in CQL dictionary. Particularly, we will extendour lexicon by adding the AVG aggregate function to calculate the average death age ofall deceases presidents.

MATCH (n:President) RETURN AVG(n.death age);

This CQL statement returns exactly the same value as the corresponding SQL statement.

5.3.6 Query 6

Natural Language QueryShow oldest death age at death of a president.

Page 64: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 53

ORCMAXIMUM Age being death age of President

SQL QuerySimilarly to Query 5(5.3.5), we will use an aggregate function to find the maximumdeath age of all president object type instances.

SELECT MAX(death age) from president;

The oldest death age that is returned by this query has the value 90.

CQL QueryThe death age of a president is prescribed as a node property in the graph model. Tofind the maximum value of that property among all the nodes labelled as President, wewill extend our lexicon with the MAX aggregate function. Thus,the corresponding CQLstatement for this query is formulated as follows:

MATCH (n:President) RETURN MAX(n.death age);

By executing this CQL statement on the Neo4j DBMS, we will get the same value as inthe case of the SQL statement, which is the numeric value 90.

5.3.7 Query 7

Natural Language QueryWhat is the total number of children resulting from presidential marriages?

ORCSUM Nr of children resulting from Marriage involving President

SQL QueryThis query requires us to use the SUM function to add up the values of the nr childrenproperty of the Marriage object type.

SELECT SUM(nr children) from pres marriage;

The result of this query is the numeric value 143.

CQL QueryIn the graph model, the number of children that correspond to the instances of theMarriage object type is prescribed by the nr of children property of the nodes labelledas Marriage. By extending our dictionary with the SUM aggregate function, we can findthe total number of children resulting from presidential marriages as follows:

MATCH (n:Marriage) RETURN SUM(n.nr of children);

The result of this query is the numeric value 143, which is equal to the result of the SQLquery.

Page 65: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 54

5.3.8 Query 8

Natural Language QueryFor each party, list the party name and the number of presidents born after the year1850.

ORC(President AND ALSO being born in Year > 1850) being member of, Name of FROMParty

SQL QueryIn this case, we will use a WHERE clause to filter the president results according tothe birth year restriction. Moreover, we will use the GROUP BY statement in order toaggregate president instances according to the party property values. Finally, we willuse the COUNT function to find how many president instances appear for each party.

SELECT party, COUNT(pres name) FROM president WHERE birth yr>1850 GROUPBY party;

This SQL statement gives us that there were 6 Democratic and 9 Republican presidentsthat were born after the year 1850.

CQL QueryThe descriptor that relates presidents with their birth age is the property birth year,which is present in nodes labelled as President. To filter the president nodes on thisdescriptor, we need to extend our dictionary by adding the WHERE clause. In theWHERE clause we can add the filters that we want to apply by using appropriatelogical conditions. For the needs of this query, we will also add the COUNT aggregatefunction to our dictionary. To formulate the CQL statement for this query, we first needto match the pattern that corresponds to the information that we want to extract fromour model. This is done by relating the President nodes, which hold the informationabout birth year, with the Member nodes, which hold the information concerning theparty presidents belong to. Then, we filter the president instances over their birth year,and finally we use the RETURN clause to display the party names and the correspondingnumber of presidents.

MATCH (n:President)−[r : member of ]− >(m:Member)WHERE n.birth year>1850RETURN m.party name, COUNT(n.name);

The result set of this CQL statement is equal to the corresponding SQL statement resultset.

5.3.9 Query 9

Natural Language QueryFor those parties which had more than 8 presidents born after 1850, list the names ofthe parties and the corresponding number of presidents born after 1850.

ORCLET RecentPresidents BE President being born in Year > 1850, then:Name of, COUNT GROUPWISE RecentPresidents being member of FROM Party

Page 66: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 55

SQL QueryThis query is an extension of the previous one in the sense that we only need to applya filter on the result set of Query 8(5.3.8). To do this in SQL, we will use the conceptof sub-querying by adding Query 8 as a nested query and we will assign its result set onthe RecentPresidents descriptor. Then, we will apply a filter on the cnt property of theRecentPresident instances to obtain our final result set.

SELECT *FROM(SELECT party, COUNT(pres name) AS cnt

FROM president WHERE birth yr>1850GROUP BY party) RecentPresidents

WHERE cnt>8;

As it is implied by the result set of the previous query, only the party of Republicanshad more than 8 presidents born after 1850. By executing this SQL statement on ourtesting RDBMS, we will get 9 Republican presidents as result set.

CQL QueryCQL case is similar to the SQL case in the sense that we will use the result set of Query8 as base, and we will extend it accordingly. First, we will need extend our languageby adding the WITH clause. This will help us to assign the results of Query 8 todescriptors party and cnt descriptors respectively. Then, we will apply a filter on cntusing a WHERE clause.

MATCH (n:President)−[r : member of ]− >(m:Member)WHERE n.birth year>1850WITH m.party name AS party, COUNT(n.name) AS cntWHERE cnt>8RETURN party, cnt;

By executing this CQL statement on the Neo4j DBMS, we will get the result that weexpect to get, which is 9 Republican presidents.

5.3.10 Query 10

Natural Language QueryCount the presidents who were members of the same party and who were born in thesame state. List party, state of birth, and this count.

ORCCOUNT GROUPWISE (President being member of Party having as member President

INTERSECTIONPresident having Name Name of PresidentINTERSECTIONPresident being born in State being birth place of President)

being member of, Name of FROM Party

SQL QueryTo express this query in SQL, we will not use any further language components, butwe will rather combine several of those components that have been used in the previousnine queries. Thus, the SQL statement for this query is formulated as follows:

Page 67: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 5. Validation 56

SELECT a.state born,a.party, COUNT(DISTINCT a.pres name)FROM president a, president bWHERE a.party=b.party and a.state born=b.state bornGROUP BY a.party, a.state born;

CQL QueryAs it is also the case in SQL, we do not need extend any more the dictionary of CQL inorder to accommodate this query. Instead, we will combine CQL components that havealready been mentioned in the previous nine queries. As result, the CQL statement forthis query will be as follows:

MATCH (b:Birth State)< −[r1 : born in]−(n:President)−[r2 : member of ]− >(m:Member)WITH COUNT(n.name) AS cnt, m.party name AS party, b.state name AS stateRETURN state, party, cnt;

The result set for this CQL statement equals the result set of the SQL statement, butwe will not display it here, because it is a long list with values.

Page 68: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 6

Conclusions andRecommendations

The subject of data model transformations that involve ORM modelling technique andgraph database technology has been studied in this thesis. The main concern of thisstudy was to address the particularities of graph database models and to define a frame-work that would enable people who work with data models to transform ORM conceptualmodels into graph databases in an effective way, by maintaining data consistency andintegrity. Transformation rules that should be followed in order to obtain a target modelthat corresponds to a conceptual model, specified using ORM modelling technique, havebeen defined in the main part of this paper. More specifically, transformation rules havebeen addressed for the most commonly used data structures which are prescribed by theORM language.

The overall conclusion that can be drawn from this study is that it is not always asimple task to implement a rigid information structure, such as an ORM model, onthe ground of a database technology which is designed for flexibility, such as the graphdatabase technology. The graph database model is a flexible model in terms of datarepresentation in the sense that lacks strict data integrity constraints. More specifically,compound uniqueness constraint or uniqueness constraint on the relationships betweenthe nodes in the graph model are not provided directly by the graph database language.In order to express such constraints using the graph database language, we need toadjust the graph database model itself. We need to modify the graph database model sothat all properties which are involved in a uniqueness constraint will be node propertiesinstead of relationship properties. Appropriate adjustments should be done also in thecase of compound uniqueness constraints, since it cannot be obtained in a direct waytoo. In the case that we need to apply a uniqueness constraint on a multiple nodeproperties, we should merge those properties into a single one first, and then to apply asimple uniqueness constraint on the compound node property.

It is obvious that the ORM language is much more expressive than the graph databaselanguage and there is not an ”one to one” correspondence between the two languages.Nevertheless, an ORM model can be transformed into a graph database model in anefficient way, if the transformation framework described in Chapter 4 is followed. Thisframework has been designed to overcome several flaws and misses of the graph databaselanguage, and to provide an efficient way to express the most commonly used ORM datastructures.

57

Page 69: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Chapter 6. Conclusions and Recommendations 58

Another conclusion of this study is that contemporary database technologies which havebeen designed for flexibility still can be compatible with older conceptual modellingtechniques which used to focus on strict information structures in order to accommodatedata integrity and data consistency. Of course, it is not always a simple task to achieveequality between the models, and some adjustments, which may entail some additionalcomplexity or data redundancy, need to be done in most cases.

Further research can be done on several axles starting with making improvements onthe transformation framework that is described in this study. More specifically, there isthe opportunity of optimising this transformation framework by defining those advancesand conventions that could be adopted in order to simplify the generated target graphdatabase model and to avoid redundancies in population data. Another research branchthat can derive from this study has to do with the extension of the existing transforma-tion framework. The transformation process can be enriched with transformation rulesfor information structures which are not covered in this study because they are usedmore rarely in practice. Moreover, further research can be done in the area of the graphdatabase language. All those key-points and characteristics, which are not supported bythe graph database language currently, could be defined in order to work as a startingpoint for the extension of the graph model itself. Last but not least, this study could beused as a trigger point for the creation of similar transformation frameworks that wouldinvolve other contemporary database stores. Given that the ORM modelling languageis very powerful and expressive, there would be plenty of space for further research onthe field of data model transformations using as target models, database stores such aswide-column stores, key-value stores and document stores.

Page 70: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A

Graph Data

A.1 Person

Table A.1: Person

name

Adams JAdams J QAgnew S T

Appleton J MArthur C AAxson E L

Barkley A WBouvier J LBreckinridgeBuchanan J

Burr ABush G

Calhoun JCarow E KCarter J EChildress SChristian LCleveland GClinton GColfax S

Coolidge CCurtis C

Custis M DDallas G M

Davis NDawes C G

De Vane KingDe Wolfe F K

Dent J BDimmick M S L

59

Page 71: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 60

name

Doud GEisenhower D DFairbanks C W

Fillmore MFolson FFord G RGalt E B

Gardiner JGarfield J AGarner J N

Gerry EGoodhue G A

Grant U SHamlin H

Harding W GHarrison B

Harrison W HHayes R B

Hendricks T AHenry L

Herndon E LHerron H

Hobart G AHoes H

Hoover H CHumphrey H H

Jackson AJefferson TJohnson A

Johnson L BJohnson L CJohnson R MKennedy J FKortright E

Lee A HLincoln AMadison J

Marshall T RMcCardle E

McIntosh C CMcKinley WMondale W F

Monroe JMorton L PNixon R M

Pierce FPolk J KPowers AReagan R

Page 72: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 61

name

Robards R DRoosevelt A ERoosevelt F DRoosevelt TRudolph LRyan T CSaxton I

Scott C LSherman J SSkelton M W

Smith ASmith M M

Smith RStevenson A ESymmes A T

Taft W HTaylor C A

Taylor ZTodd D D P

Todd MTompkins DTruman H S

Tyler JVan Buren MWallace E VWallace H AWarren E B

Washington GWebb L WWheeler WWilson HWilson WWyman J

A.2 President

Table A.2: President

name birth year death age years served

Washington G 1732 67 7Adams J 1735 90 4

Jefferson T 1743 83 8Madison J 1751 85 8Monroe J 1758 73 8

Adams J Q 1767 80 4Jackson A 1767 78 8

Van Buren M 1782 79 4Harrison W H 1773 68 0

Page 73: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 62

name birth year death age years served

Tyler J 1790 71 3Polk J K 1795 53 4Taylor Z 1784 65 1

Fillmore M 1800 74 2Pierce F 1804 64 4

Buchanan J 1791 77 4Lincoln A 1809 56 4Johnson A 1808 66 3Grant U S 1822 63 8Hayes R B 1822 70 4

Garfield J A 1831 49 0Arthur C A 1830 56 3Cleveland G 1837 71 8Harrison B 1833 67 4

McKinley W 1843 58 4Roosevelt T 1858 60 7Taft W H 1857 72 4Wilson W 1856 67 8

Harding W G 1865 57 2Coolidge C 1872 60 5Hoover H C 1874 90 4

Roosevelt F D 1882 63 12Truman H S 1884 88 7

Eisenhower D D 1890 79 8Kennedy J F 1917 46 2Johnson L B 1908 65 5

A.3 Vice President

Table A.3: Vice President

vp id

admin nr president name vice pres name

1 Washington G Adams J2 Washington G Adams J3 Adams J Jefferson T4 Jefferson T Burr A5 Jefferson T Clinton G6 Madison J Clinton G7 Madison J Gerry E8 Monroe J Tompkins D9 Monroe J Tompkins D10 Adams J Q Calhoun J11 Jackson A Calhoun J12 Jackson A Van Buren M13 Van Buren M Johnson R M14 Harrison W H Tyler J15 Polk J K Dallas G M

Page 74: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 63

vp id

admin nr president name vice pres name

16 Taylor Z Fillmore M17 Pierce F De Vane King18 Buchanan J Breckinridge19 Lincoln A Hamlin H20 Lincoln A Johnson A21 Grant U S Colfax S22 Grant U S Wilson H23 Hayes R B Wheeler W24 Garfield J A Arthur C A25 Cleveland G Hendricks T A26 Harrison B Morton L P27 Cleveland G Stevenson A E28 McKinley W Hobart G A29 McKinley W Roosevelt T30 Roosevelt T Fairbanks C W31 Taft W H Sherman J S32 Wilson W Marshall T R33 Wilson W Marshall T R34 Harding W G Coolidge C35 Coolidge C Dawes C G36 Hoover H C Curtis C37 Roosevelt F D Garner J N38 Roosevelt F D Garner J N39 Roosevelt F D Wallace H A40 Roosevelt F D Truman H S41 Truman H S Barkley A W42 Eisenhower D D Nixon R M43 Eisenhower D D Nixon R M44 Kennedy J F Johnson L B45 Johnson L B Humphrey H H46 Nixon R M Agnew S T47 Nixon R M Agnew S T48 Carter J E Mondale W F49 Reagan R Bush G

A.4 Administration

Table A.4: Administration

admin nr year inaugurated

1 17892 17933 17974 18015 18056 18097 1813

Page 75: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 64

admin nr year inaugurated

8 18179 182110 182511 182912 183313 183714 184115 184516 184917 185318 185719 186120 186521 186922 187323 187724 188125 188526 188927 189328 189729 190130 190531 190932 191333 191734 192135 192536 192937 193338 193739 194140 194541 194942 195343 195744 196145 196546 196947 197348 197749 1981

A.5 Presidency

Page 76: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 65

Table A.5: Presidency

admin nr person name

1 Washington G2 Washington G3 Adams J4 Jefferson T5 Jefferson T6 Madison J7 Madison J8 Monroe J9 Monroe J10 Adams J Q11 Jackson A12 Jackson A13 Van Buren M14 Harrison W H15 Polk J K16 Taylor Z17 Pierce F18 Buchanan J19 Lincoln A20 Lincoln A21 Grant U S22 Grant U S23 Hayes R B24 Garfield J A25 Cleveland G26 Harrison B27 Cleveland G28 McKinley W29 McKinley W30 Roosevelt T31 Taft W H32 Wilson W33 Wilson W34 Harding W G35 Coolidge C36 Hoover H C37 Roosevelt F D38 Roosevelt F D39 Roosevelt F D40 Roosevelt F D41 Truman H S42 Eisenhower D D43 Eisenhower D D44 Kennedy J F45 Johnson L B46 Nixon R M47 Nixon R M

Page 77: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 66

admin nr person name

47 Nixon R M47 Ford G R48 Carter J E49 Reagan R

A.6 Marriage

Table A.6: Marriage

id year president age spouse age nr of children

president name spouse name year president age spouse age nr of children

Washington G Custis M D 1759 26 27 0Adams J Smith A 1764 28 19 5

Jefferson T Skelton M W 1772 28 23 6Madison J Todd D D P 1794 43 26 0Monroe J Kortright E 1786 27 17 3

Adams J Q Johnson L C 1797 30 22 4Jackson A Robards R D 1794 26 26 0

Van Buren M Hoes H 1807 24 23 4Harrison W H Symmes A T 1795 22 20 10

Tyler J Christian L 1813 23 22 8Tyler J Gardiner J 1844 54 24 7

Polk J K Childress S 1824 28 20 0Taylor Z Smith M M 1810 25 21 6

Fillmore M Powers A 1826 26 27 2Fillmore M McIntosh C C 1858 58 44 0

Pierce F Appleton J M 1834 29 28 3Lincoln A Todd M 1842 33 23 4Johnson A McCardle E 1827 18 16 5Grant U S Dent J B 1848 26 22 4Hayes R B Webb L W 1852 30 21 8

Garfield J A Rudolph L 1858 26 26 7Arthur C A Herndon E L 1859 29 22 3Cleveland G Folson F 1886 49 21 5Harrison B Scott C L 1853 20 21 2Harrison B Dimmick M S L 1896 62 37 1

McKinley W Saxton I 1871 27 23 2Roosevelt T Lee A H 1880 22 19 1Roosevelt T Carow E K 1886 28 25 5Taft W H Herron H 1886 28 25 3Wilson W Axson E L 1885 28 25 3Wilson W Galt E B 1915 58 43 0

Harding W G De Wolfe F K 1891 25 30 0Coolidge C Goodhue G A 1905 33 26 2Hoover H C Henry L 1899 24 23 2

Roosevelt F D Roosevelt A E 1905 23 20 6Truman H S Wallace E V 1919 35 34 1

Eisenhower D D Doud G 1916 25 19 2

Page 78: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 67

id year president age spouse age nr of children

president name spouse name year president age spouse age nr of children

Kennedy J F Bouvier J L 1953 36 24 3Johnson L B Taylor C A 1934 26 21 2Nixon R M Ryan T C 1940 27 28 2Ford G R Warren E B 1948 35 30 4Carter J E Smith R 1946 21 18 4Reagan R Wyman J 1940 28 25 2Reagan R Davis N 1952 41 28 2

A.7 Winner

Table A.7: Winner

president name election year

Washington G 1789Washington G 1792

Adams J 1796Jefferson T 1800Jefferson T 1804Madison J 1808Madison J 1812Monroe J 1816Monroe J 1820

Adams J Q 1824Jackson A 1828Jackson A 1832

Van Buren M 1836Harrison W H 1840

Polk J K 1844Taylor Z 1848Pierce F 1852

Buchanan J 1856Lincoln A 1860Lincoln A 1864Grant U S 1868Grant U S 1872Hayes R B 1876

Garfield J A 1880Cleveland G 1884Harrison B 1888

Cleveland G 1892McKinley W 1896McKinley W 1900Roosevelt T 1904Taft W H 1908Wilson W 1912Wilson W 1916

Harding W G 1920

Page 79: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 68

president name election year

Coolidge C 1924Hoover H C 1928

Roosevelt F D 1932Roosevelt F D 1936Roosevelt F D 1940Roosevelt F D 1944Truman H S 1948

Eisenhower D D 1952Eisenhower D D 1956

Kennedy J F 1960Johnson L B 1964Nixon R M 1968Nixon R M 1972Carter J E 1976Reagan R 1980

A.8 Score

Table A.8: Score

id votes

candidate election year votes

Washington G 1789 69Adams J 1789 34

Jay J 1789 9Harrison R H 1789 6

Rutledge J 1789 6Hancock J 1789 4Clinton G 1789 3

Huntington S 1789 2Milton J 1789 2

Armstrong J 1789 1Lincoln B 1789 1Telfair E 1789 1

Washington G 1792 132Adams J 1792 77Clinton G 1792 50Jefferson T 1792 4

Burr A 1792 1Adams J 1796 71

Jefferson T 1796 68Pinckney T 1796 59

Burr A 1796 30Adams S 1796 15

Ellsworth O 1796 11Clinton G 1796 7

Jay J 1796 5Iredell J 1796 3

Page 80: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 69

id votes

candidate election year votes

Henry J 1796 2Johnson S 1796 2

Washington G 1796 2Pinckney C C 1796 1

Jefferson T 1800 73Burr A 1800 73

Adams J 1800 65Pinckney C C 1800 64

Jay J 1800 1Jefferson T 1804 162

Pinckney C C 1804 14Madison J 1808 122

Pinckney C C 1808 47Clinton G 1808 6Madison J 1812 128Clinton G 1812 89Monroe J 1816 183King R 1816 34

Monroe J 1820 231Adams J Q 1820 1Adams J Q 1824 84Jackson A 1824 99

Crawford W H 1824 41Clay H 1824 37

Jackson A 1828 178Adams J 1828 83

Jackson A 1832 219Clay H 1832 49Floyd J 1832 11Wirt W 1832 7

Van Buren M 1836 170Harrison W H 1836 73

White H L 1836 26Webster D 1836 14

Mangum W P 1836 11Harrison W H 1840 234Van Buren M 1840 60

Polk J K 1844 170Clay H 1844 105

Taylor Z 1848 163Cass L 1848 127

Pierce F 1852 254Scott W 1852 42

Buchanan J 1856 174Fremont J C 1856 114Fillmore M 1856 8Lincoln A 1860 180

Breckinridge J 1860 72

Page 81: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 70

id votes

candidate election year votes

Bell J 1860 39Douglas S 1860 12Lincoln A 1864 212

McClellan G B 1864 21Grant U S 1868 214Seymour 1868 80

Grant U S 1872 286Hendricks T A 1872 42

Brown B G 1872 18Jenkins C J 1872 2

Davis D 1872 1Hayes R B 1876 185Tilden S J 1876 184

Garfield J A 1880 214Hancock W S 1880 155Cleveland G 1884 219Blaine J G 1884 182Harrison B 1888 233

Cleveland G 1888 168Cleveland G 1892 277Harrison B 1892 145Weaver J B 1892 22McKinley W 1896 271Bryan W J 1896 176

McKinley W 1900 292Bryan W J 1900 155Roosevelt T 1904 336Parker A B 1904 140Taft W H 1908 321

Bryan W J 1908 162Wilson W 1912 435

Roosevelt T 1912 88Taft W H 1912 8Wilson W 1916 277

Hughes C E 1916 254Harding W G 1920 404

Cox W W 1920 127Coolidge C 1924 382Davis J W 1924 136

La Follette R M 1924 13Hoover H C 1928 444Smith A E 1928 87

Roosevelt F D 1932 472Hoover H C 1932 59

Roosevelt F D 1936 523Landon A M 1936 8

Roosevelt F D 1940 449Wilkie W L 1940 82

Page 82: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 71

id votes

candidate election year votes

Roosevelt F D 1944 432Dewey T E 1944 99Truman H S 1948 303Dewey T E 1948 189

Thurmond J S 1948 39Eisenhower D D 1952 442

Stevenson A 1952 89Eisenhower D D 1956 457

Stevenson A 1956 73Jones W B 1956 1

Kennedy J F 1960 303Nixon R M 1960 219

Byrd 1960 15Johnson L B 1964 486Goldwater B 1964 52Nixon R M 1968 301

Humphrey H H 1968 191Wallace G C 1968 46Nixon R M 1972 520

McGovern G S 1972 17Hospers J 1972 1Carter J E 1976 297Ford G R 1976 240Reagan R 1980 489Carter J 1980 49

A.9 Election

Table A.9: Election

election year

1789179217961800180418081812181618201824182818321836184018441848

Page 83: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 72

election year

185218561860186418681872187618801884188818921896190019041908191219161920192419281932193619401944194819521956196019641968197219761980

A.10 Hobby

Table A.10: Hobby

president name hobby name

Adams J Q BilliardsAdams J Q SwimmingAdams J Q WalkingArthur C A FishingCleveland G FishingCoolidge C FishingCoolidge C GolfCoolidge C Indian ClubsCoolidge C Mechanical Horse

Page 84: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 73

president name hobby name

Coolidge C Pitching HayEisenhower D D BridgeEisenhower D D GolfEisenhower D D HuntingEisenhower D D PaintingEisenhower D D Fishing

Garfield J A BilliardsHarding W G GolfHarding W G PokerHarding W G Riding

Harrison B HuntingHayes R B CroquetHayes R B DrivingHayes R B ShootingHoover H C FishingHoover H C Medicine BallJackson A RidingJefferson T FishingJefferson T Riding

Johnson L B RidingKennedy J F SailingKennedy J F SwimmingKennedy J F Touch Football

Lincoln A WalkingMcKinley W RidingMcKinley W SwimmingMcKinley W WalkingNixon R M Golf

Roosevelt F D FishingRoosevelt F D SailingRoosevelt F D SwimmingRoosevelt T BoxingRoosevelt T HuntingRoosevelt T JujitsuRoosevelt T RidingRoosevelt T ShootingRoosevelt T TennisRoosevelt T WrestlingTaft W H GolfTaft W H RidingTaylor Z Riding

Truman H S FishingTruman H S PokerTruman H S WalkingVan Buren M RidingWashington G FishingWashington G Riding

Wilson W GolfWilson W Riding

Page 85: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 74

president name hobby name

Wilson W Walking

A.11 Birth State

Table A.11: Birth State

president name state name

Washington G VirginiaAdams J Massachusetts

Jefferson T VirginiaMadison J VirginiaMonroe J Virginia

Adams J Q MassachusettsJackson A South Carolina

Van Buren M New YorkHarrison W H Virginia

Tyler J VirginiaPolk J K North CarolinaTaylor Z Virginia

Fillmore M New YorkPierce F New Hampshire

Buchanan J PennsylvaniaLincoln A KentuckyJohnson A North CarolinaGrant U S OhioHayes R B Ohio

Garfield J A OhioArthur C A VermontCleveland G New JerseyHarrison B Ohio

McKinley W OhioRoosevelt T New YorkTaft W H OhioWilson W Virginia

Harding W G OhioCoolidge C VermontHoover H C Iowa

Roosevelt F D New YorkTruman H S Missouri

Eisenhower D D TexasKennedy J F MassachusettsJohnson L B TexasNixon R M CaliforniaFord G R NebraskaCarter J E GeorgiaReagan R Illinois

Page 86: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 75

A.12 State

Table A.12: State

state name union entered year

Massachusetts 1776Pennsylvania 1776

Virginia 1776Connecticut 1776

South Carolina 1776Maryland 1776

New Jersey 1776Georgia 1776

New Hampshire 1776Delaware 1776New York 1776

North Carolina 1776Rhode Island 1776

Vermont 1791Kentucky 1792Tennessee 1796

Ohio 1803Louisianna 1812

Indiana 1816Mississippi 1817

Illinois 1818Alabama 1819

Maine 1820Missouri 1821Arkansas 1836Michigan 1837Florida 1845Texas 1845Iowa 1846

Wisconsin 1848California 1850Minnesota 1858

Oregon 1859Kansas 1861

West Virginia 1863Nevada 1864

Nebraska 1867Colorado 1876

North Dakota 1889South Dakota 1889

Montana 1889Washington 1889

Idaho 1890Wyoming 1890

Utah 1896

Page 87: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 76

state name union entered year

Oklahoma 1907New Mexico 1912

Arizona 1912Alaska 1959Hawaii 1959

A.13 Member

Table A.13: Member

president name party name

Washington G FederalistAdams J Federalist

Jefferson T Demo-RepMadison J Demo-RepMonroe J Demo-Rep

Adams J Q Demo-RepJackson A Democratic

Van Buren M DemocraticHarrison W H Whig

Tyler J WhigPolk J K DemocraticTaylor Z Whig

Fillmore M WhigPierce F Democratic

Buchanan J DemocraticLincoln A RepublicanJohnson A DemocraticGrant U S RepublicanHayes R B Republican

Garfield J A RepublicanArthur C A RepublicanCleveland G DemocraticHarrison B Republican

McKinley W RepublicanRoosevelt T RepublicanTaft W H RepublicanWilson W Democratic

Harding W G RepublicanCoolidge C RepublicanHoover H C Republican

Roosevelt F D DemocraticTruman H S Democratic

Eisenhower D D RepublicanKennedy J F DemocraticJohnson L B DemocraticNixon R M RepublicanFord G R Republican

Page 88: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Appendix A. Graph Data 77

president name party name

Carter J E DemocraticReagan R Republican

A.14 Party

Table A.14: Party

party name

FederalistDemo-RepDemocratic

WhigRepublican

Page 89: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Bibliography

[1] Peter Pin-Shan Chen. The entity-relationship model&mdash;toward a unified viewof data. ACM Trans. Database Syst., 1(1):9–36, March 1976. ISSN 0362-5915. doi:10.1145/320434.320440. URL http://doi.acm.org/10.1145/320434.320440.

[2] Ramez Elmasri and Shamkant B. Navathe. Fundamentals of Database Systems (2NdEd.). Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, USA, 1994.ISBN 0-8053-1748-1.

[3] G. M. Nijssen and T. A. Halpin, editors. Conceptual Schema and RelationalDatabase Design: A Fact Oriented Approach. Prentice-Hall, Inc., Upper SaddleRiver, NJ, USA, 1989. ISBN 0-13-167263-0.

[4] A. H. M. Ter Hofstede and Th. P. Van Der Weide. Expressiveness in conceptualdata modelling. Data Knowledge Engineering, 10:65–100, 1993.

[5] E. F. Codd. A relational model of data for large shared data banks. Commun.ACM, 13(6):377–387, June 1970. ISSN 0001-0782. doi: 10.1145/362384.362685.URL http://doi.acm.org/10.1145/362384.362685.

[6] C. J. Date and Hugh Darwen. A Guide to the SQL Standard (4th Ed.): A User’sGuide to the Standard Database Language SQL. Addison-Wesley Longman Pub-lishing Co., Inc., Boston, MA, USA, 1997. ISBN 0-201-96426-0.

[7] A. B. M. Moniruzzaman and Syed Akhter Hossain. Nosql database: New era ofdatabases for big data analytics - classification, characteristics and comparison.CoRR, abs/1307.0191, 2013. URL http://arxiv.org/abs/1307.0191.

[8] Michael Stonebraker. Sql databases v. nosql databases. Commun. ACM, 53(4):10–11, April 2010. ISSN 0001-0782. doi: 10.1145/1721654.1721659. URL http:

//doi.acm.org/10.1145/1721654.1721659.

[9] Ma lgorzata Bach and Aleksandra Werner. Standardization of NoSQL DatabaseLanguages, pages 50–60. Springer International Publishing, Cham, 2014. ISBN978-3-319-06932-6. doi: 10.1007/978-3-319-06932-6 6. URL http://dx.doi.org/

10.1007/978-3-319-06932-6_6.

[10] K. Czarnecki and S. Helsen. Feature-based survey of model transformation ap-proaches. IBM Syst. J., 45(3):621–645, July 2006. ISSN 0018-8670. doi:10.1147/sj.453.0621. URL http://dx.doi.org/10.1147/sj.453.0621.

[11] P. van Bommel, Gy Kovacs, and A Micsik. Transformation of database populationsand operations from the conceptual to the internal level. Information Systems, 19(2):175 – 191, 1994. URL http://www.sciencedirect.com/science/article/

pii/0306437994900094.

78

Page 90: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Bibliography 79

[12] Christian Fahrner and Gottfried Vossen. A survey of database design trans-formations based on the entity-relationship model. Data Knowledge Engi-neering, 15(3):213 – 250, 1995. ISSN 0169-023X. doi: http://dx.doi.org/10.1016/0169-023X(95)00006-E. URL http://www.sciencedirect.com/science/

article/pii/0169023X9500006E.

[13] P. Johannesson. A method for transforming relational schemas into conceptualschemas. In Proceedings of 1994 IEEE 10th International Conference on DataEngineering, pages 190–201, Feb 1994. doi: 10.1109/ICDE.1994.283030.

[14] Daniel Varro. Model Transformation by Example, pages 410–424. Springer BerlinHeidelberg, Berlin, Heidelberg, 2006. ISBN 978-3-540-45773-2. doi: 10.1007/11880240 29. URL http://dx.doi.org/10.1007/11880240_29.

[15] T. A. Halpin and H. A. Proper. Database schema transformation and optimization,pages 191–203. Springer Berlin Heidelberg, Berlin, Heidelberg, 1995. ISBN 978-3-540-48527-8. doi: 10.1007/BFb0020532. URL http://dx.doi.org/10.1007/

BFb0020532.

[16] Ian Robinson, Jim Webber, and Emil Eifrem. Graph Databases. O’Reilly Media,Inc., 2013. ISBN 1449356265, 9781449356262.

[17] Terry Halpin. Object-role modeling: an overview. 1998. URL http://www.orm.

net/pdf/ORMwhitePaper.pdf.

[18] Terry Halpin. Orm/niam object-role modeling. In Peter Bernus, Kai Mertins,and Gunter Schmidt, editors, Handbook on Architectures of Information Systems,International Handbooks on Information Systems, pages 81–101. Springer BerlinHeidelberg, 1998. ISBN 978-3-662-03528-3. doi: 10.1007/978-3-662-03526-9 4. URLhttp://dx.doi.org/10.1007/978-3-662-03526-9_4.

[19] S. Twine. Mapping between a niam conceptual schema and kee frames. DataKnowl. Eng., 4(2):125–155, 1989. doi: 10.1016/0169-023X(89)90037-2. URL http:

//dx.doi.org/10.1016/0169-023X(89)90037-2.

[20] G. H. W. M. Bronts, S. J. Brouwer, C. L. J. Martens, and H. A. Proper. A unifyingobject role modelling theory. Information Systems, 20:213–235, 1998. URL http:

//citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.9359&rank=3.

[21] P. Van Bommel and Th. P. Van Der Weide. Reducing the search spacefor conceptual schema transformation. Data Knowledge Engineering, 8,1993. URL http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=

F75E42DB2BBAA95960E7C0981C3C709B?doi=10.1.1.49.7682&rep=rep1&type=

pdf.

[22] P. van Bommel, A.H.M. ter Hofstede, and Th.P. van der Weide. Semantics and ver-ification of object-role models. INFORMATION SYSTEMS, 16(5):471–495, 1991.

[23] C.M.R. Leung and G.M. Nijssen. Relational database design using the NIAMconceptual schema. Information Systems, 13(2):219 – 227, 1988. URL http://

www.sciencedirect.com/science/article/pii/030643798890018X.

[24] T. A. Halpin and H. A. Proper. Subtyping and polymorphism in object-role mod-elling. Data Knowl. Eng., 15(3):251–281, June 1995. URL http://dx.doi.org/

10.1016/0169-023X(95)00005-D.

Page 91: A Transformation from ORM Conceptual Models to Neo4j Graph Database

Bibliography 80

[25] A. H. M. ter Hofstede, H. A. Proper, and Th. P. van der Weide. Formal definition ofa conceptual language for the description and manipulation of information models.Information Systems, 18(7):489–523, October 1993. URL http://dx.doi.org/10.

1016/0306-4379(93)90004-K.

[26] A.H.M. ter Hofstede, H.A. Proper, and Th.P. van der Weide. A note on schemaequivalence. Technical Report CSI–R9230, Department of Information Systems,University of Nijmegen, Nijmegen, The Netherlands, EU, 1992. URL http://

citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.8640.

[27] A.H.M. ter Hofstede, H.A. Proper, and Th.P. van der Weide. Data modelling incomplex application domains. In Pericles Loucopoulos, editor, Advanced Infor-mation Systems Engineering, volume 593 of Lecture Notes in Computer Science,pages 364–377. Springer Berlin Heidelberg, 1992. ISBN 978-3-540-55481-3. doi:10.1007/BFb0035142. URL http://dx.doi.org/10.1007/BFb0035142.

[28] Brouwer Martens Bronts, S. J. Brouwer, C. L. J. Martens, G. H. W. M. Bronts, andH. A. Proper. Towards a unifying object role modelling theory. In In T.A. Halpinand R. Meersman (Eds.) Proceedings of the First International Conference onObject-Role Modelling (ORM-1), Magnetic Island, pages 259–273, 1994. URL http:

//citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.4798&rank=1.

[29] Rik Van Bruggen. Learning Neo4j. Packt Publishing Ltd, 2014. ISBN 1849517177,9781849517171.

[30] Greg Jordan. Practical Neo4j. Apress, 2014. ISBN 1484200233, 9781484200230.