data management plan - ew-shopp · 2017-09-14 · data management plan deliverable n: 2.1 date: 30...
TRANSCRIPT
DataManagementPlan
Deliverablen: 2.1Date: 30June2017Status: FinalVersion: 1.0Authors: AngeloMarguglio(ENG),AndreaMaurino(UNIMIB),MatteoPalmonari
(UNIMIB),NikolayNikolov(SINTEF)Contributors: ALL
Reviewers: TitiRoman(SINTEF),FernandoPerales(JOT)Distribution: PU
Grantn.732590-H2020-ICT-2016-2017/H2020-ICT-2016-1
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 2
HistoryofChanges
Version Date Description Revised by
0.2 23/02/2017 Outlineandmainsectionsaddedtothedeliverable
AngeloMarguglio(ENG)
0.3 09/05/2017 Updateddocumenttemplate AngeloMarguglio(ENG)
0.4 21/05/2017 FollowingcommentsfromMatteoPalmonariUNIMIB),providedfurtherdetailson§5,§6,and§7.4.
AngeloMarguglio(ENG)
0.5 24/05/2017 Assignedmainroles AngeloMarguglio(ENG)
0.6 29/05/2017 FirstsampledescriptionofJOTdatasetstobeusedinBC4
AngeloMarguglio(ENG)
0.7 02/06/2017 Minorrevisions AngeloMarguglio(ENG)
0.8 08/06/2017 Updateddatasetdescriptionstructure;addedBC4description;integratedcontributionsfromSINTEF
AngeloMarguglio(ENG),NikolayNikolov(SINTEF)
0.9 13/06/2017 Changedorderof“EthicsandLegalCompliance”and“ProjectDataManagement”chapters;added“EthicsandLegalrequirements”paragraph;addedlegalrequirementsregardingpersonaldata;addedmappingtableDataset-BC.Deliverablereadyforpeerreviewing.
AngeloMarguglio(ENG),AndreaMaurino(UNIMIB),NikolayNikolov(SINTEF)
1.0 28/06/2017 Deliverablerevisedaccordingto AngeloMarguglio(ENG),
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 3
reviewer’scomment:addedmoreinformationabouttheGDPRnormativeinthegeneralsection.Addedlastinputtocompletedatasetdescription.AddedexplanationaboutInteroperabilityandVocabularyinEW-Shoppproject.Finalcheckbyprojectcoordinator.
MatteoPalmonari(UNIMIB)
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 4
Executivesummary
EW-Shoppaimsat supporting companiesoperating in the fragmentedEuropeanecosystemof theeCommerce, Retail and Marketing industries to increase their efficiency and competitiveness byleveragingdeepcustomerinsightsthataretoochallengingforthemtoobtaintoday.Theintegrationof public and private data collected by different business partnerswill ensure to cover customerinteractions and activities across different channels, providing insights on rich customer journeys.These integrated data will be further enriched with information about weather and events, twocrucialfactorsimpactingconsumerchoices.Torealizetheseobjectives,aplatform,alsoreferredtoasEW-Shoppplatform,willbebuilt.
TheDataManagementPlan(DMP)reportsonthedatathatEW-Shoppprojectwilluseandgenerateduringitslife,fromthesetupoftheEW-ShoppPlatformtothebusinessexploitationofitsservices.
Thedeliverable, following theHorizon2020guidelines1, defines thegeneral approach thatwill beadopted in thecontextofEW-Shoppproject in termsofdatamanagementpolicies. InaccordancewiththeseGuidelines,thisdeliverablewillincludeinformationaboutthehandlingofdataduringandaftertheendoftheproject,reservingaparticularattentiontothemethodologyandstandardstobeapplied.
InadditiontotheguidelinesprovidedbytheEuropeanCommission,thisdocumentalsoreferstotheplan to address the legal and ethical issues related to data that will be collected, in closecollaborationwith theactivitiesundertakenby theEW-ShoppEthicsAdvisoryBoardand themainoutcomesfromWP7.
The deliverable describes the approach established in EW-Shopp to ensure the life-cyclemanagement of the public and proprietary datasets provided by the consortiummembers to theprojectaswellasotherdatasetproducedbytheConsortiumduringtheprojectexecution.
Inparticular,thisreportdescribesrules,bestpracticesandstandardsusedwithregardtomakethedata findable, accessible, interoperable and reusable (FAIR data) and the process to collect andmanagedataincompliancewithethicalandlegalrequirements.Thedeliverableincludesahigh-leveldescription of the four business cases (BC1: Bing Bang, Ceneje, and Browsetel; BC2: GfK, BC3:Measurence; BC4: Jot Internet Media) and descriptions of the datasets provided for EW-Shoppproject,whichaimtodetailidentification,origin,format,access,securityofthedataandtotakeintoaccountlegalandethicsrequirements.
1EuropeanCommission,Directorate-GeneralforResearch&Innovation(26July2016).GuidelinesonFAIRDataManagementinHorizon2020.Retrievedfromhttp://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 5
TableofContent
HistoryofChanges.................................................................................................................................2
Executivesummary................................................................................................................................4
TableofContent....................................................................................................................................5
ListofTables12
Acronymstable....................................................................................................................................16
Chapter1 Introduction...................................................................................................................17
1.1 PrinciplesunderlyingEW-ShoppDMP....................................................................................18
1.2 GeneralApproach...................................................................................................................18
1.3 Applicabledocumentsandreferences...................................................................................19
1.4 Updatesofthisdeliverable.....................................................................................................20
Chapter2 ProjectDataManagement.............................................................................................21
2.1 Projectpurposes.....................................................................................................................21
2.2 Projectdata............................................................................................................................21
2.3 Audience.................................................................................................................................22
2.4 Rolesandresponsibilities.......................................................................................................22
Chapter3 EthicsandLegalCompliance..........................................................................................25
3.1 Legalrequirementsregardingpersonaldata..........................................................................25
3.1.1 Coreconcepts................................................................................................................25
3.1.2 FundamentalPrinciples..................................................................................................28
3.1.3 Notificationprocessanddataprotectionimpactassessment.......................................30
3.1.4 NotificationprocessinEW-Shoppproject.....................................................................31
3.2 Ethicsrequirementsregardingtheinvolvementofhumanrights..........................................32
3.3 IntellectualPropertyRights....................................................................................................32
Chapter4 BusinessCasehigh-leveldescription.............................................................................33
4.1 BingBang,CENEJE(BC1).........................................................................................................33
4.2 GfK(BC2).................................................................................................................................34
4.3 Measurence(BC3)..................................................................................................................34
4.4 JOT(BC4).................................................................................................................................35
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 6
Chapter5 EW-ShoppMethodologyforDMP..................................................................................37
5.1 ElementsofEW-ShoppDataManagementPlan....................................................................37
5.1.1 DatasetIDENTIFICATION................................................................................................38
5.1.2 DatasetORIGIN..............................................................................................................38
5.1.3 DatasetFORMAT............................................................................................................38
5.1.4 DatasetACCESS..............................................................................................................39
5.1.5 DataSECURITY...............................................................................................................42
5.2 Processtocollectdatasetdetails...........................................................................................43
Chapter6 Datasetdescription........................................................................................................44
6.1 CEDataset-ConsumerData:PurchaseIntent.......................................................................44
6.1.1 DatasetIDENTIFICATION................................................................................................44
6.1.2 DatasetORIGIN..............................................................................................................44
6.1.3 DatasetFORMAT............................................................................................................45
6.1.4 DatasetACCESS..............................................................................................................46
6.1.5 DatasetSECURITY...........................................................................................................46
6.1.6 EthicsandLegalrequirements.......................................................................................47
6.2 MEDataset-ConsumerData:Locationanalyticsdata(Hourly).............................................47
6.2.1 DatasetIDENTIFICATION................................................................................................47
6.2.2 DatasetORIGIN..............................................................................................................48
6.2.3 DatasetFORMAT............................................................................................................48
6.2.4 DatasetACCESS..............................................................................................................48
6.2.5 DatasetSECURITY...........................................................................................................49
6.2.6 EthicsandLegalrequirements.......................................................................................49
6.3 MEDataset-ConsumerData:Locationanalyticsdata(Daily)...............................................50
6.3.1 DatasetIDENTIFICATION................................................................................................50
6.3.2 DatasetORIGIN..............................................................................................................50
6.3.3 DatasetFORMAT............................................................................................................51
6.3.4 DatasetACCESS..............................................................................................................51
6.3.5 DatasetSECURITY...........................................................................................................52
6.3.6 EthicsandLegalrequirements.......................................................................................52
6.4 BBDataset-ConsumerData:CustomerPurchaseHistory.....................................................53
6.4.1 DatasetIDENTIFICATION................................................................................................53
6.4.2 DatasetORIGIN..............................................................................................................53
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 7
6.4.3 DatasetFORMAT............................................................................................................54
6.4.4 DatasetACCESS..............................................................................................................54
6.4.5 DatasetSECURITY...........................................................................................................55
6.4.6 EthicsandLegalrequirements.......................................................................................55
6.5 BBDataset-ConsumerData:ConsumerIntentandInteraction............................................55
6.5.1 DatasetIDENTIFICATION................................................................................................55
6.5.2 DatasetORIGIN..............................................................................................................56
6.5.3 DatasetFORMAT............................................................................................................56
6.5.4 DatasetACCESS..............................................................................................................57
6.5.5 DatasetSECURITY...........................................................................................................57
6.5.6 EthicsandLegalrequirements.......................................................................................58
6.6 MEDataset-ConsumerData:Locationanalyticsdata(Weekly)...........................................58
6.6.1 DatasetIDENTIFICATION................................................................................................58
6.6.2 DatasetORIGIN..............................................................................................................58
6.6.3 DatasetFORMAT............................................................................................................59
6.6.4 DatasetACCESS..............................................................................................................59
6.6.5 DatasetSECURITY...........................................................................................................60
6.6.6 EthicsandLegalrequirements.......................................................................................60
6.7 BTDataset-CustomerCommunicationData:ContactandConsumerInteractionHistory...61
6.7.1 DatasetIDENTIFICATION................................................................................................61
6.7.2 DatasetORIGIN..............................................................................................................62
6.7.3 DatasetFORMAT............................................................................................................62
6.7.4 DatasetACCESS..............................................................................................................64
6.7.5 DatasetSECURITY...........................................................................................................64
6.7.6 EthicsandLegalrequirements.......................................................................................65
6.8 ECMWFDataset-Weather:MARSHistoricalData.................................................................65
6.8.1 DatasetIDENTIFICATION................................................................................................65
6.8.2 DatasetORIGIN..............................................................................................................65
6.8.3 DatasetFORMAT............................................................................................................66
6.8.4 DatasetACCESS..............................................................................................................66
6.8.5 DatasetSECURITY...........................................................................................................67
6.8.6 EthicsandLegalrequirements.......................................................................................67
6.9 CEDataset-ProductsandCategories:ProductAttributes.....................................................67
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 8
6.9.1 DatasetIDENTIFICATION................................................................................................67
6.9.2 DatasetORIGIN..............................................................................................................68
6.9.3 DatasetFORMAT............................................................................................................68
6.9.4 DatasetACCESS..............................................................................................................69
6.9.5 DatasetSECURITY...........................................................................................................69
6.9.6 EthicsandLegalrequirements.......................................................................................70
6.10 JSIDataset-Media:EventRegistry....................................................................................70
6.10.1 DatasetIDENTIFICATION...........................................................................................70
6.10.2 DatasetORIGIN..........................................................................................................70
6.10.3 DatasetFORMAT.......................................................................................................71
6.10.4 DatasetACCESS..........................................................................................................71
6.10.5 DatasetSECURITY......................................................................................................72
6.10.6 EthicsandLegalrequirements...................................................................................72
6.11 GfKDataset-Consumerdata:Consumerdata...................................................................73
6.11.1 DatasetIDENTIFICATION...........................................................................................73
6.11.2 DatasetORIGIN..........................................................................................................73
6.11.3 DatasetFORMAT.......................................................................................................74
6.11.4 DatasetACCESS..........................................................................................................74
6.11.5 DatasetSECURITY......................................................................................................75
6.11.6 EthicsandLegalrequirements...................................................................................75
6.12 GfKDataset-Marketdata:Salesdata...............................................................................75
6.12.1 DatasetIDENTIFICATION...........................................................................................75
6.12.2 DatasetORIGIN..........................................................................................................76
6.12.3 DatasetFORMAT.......................................................................................................76
6.12.4 DatasetACCESS..........................................................................................................77
6.12.5 DatasetSECURITY......................................................................................................77
6.12.6 EthicsandLegalrequirements...................................................................................78
6.13 GfKDataset–Products&Categories:Productattributes..................................................78
6.13.1 DatasetIDENTIFICATION...........................................................................................78
6.13.2 DatasetORIGIN..........................................................................................................79
6.13.3 DatasetFORMAT.......................................................................................................79
6.13.4 DatasetACCESS..........................................................................................................80
6.13.5 DatasetSECURITY......................................................................................................81
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 9
6.13.6 EthicsandLegalrequirements...................................................................................81
6.14 MEDataset-ConsumerData:Doorcounterdata..............................................................81
6.14.1 DatasetIDENTIFICATION...........................................................................................81
6.14.2 DatasetORIGIN..........................................................................................................82
6.14.3 DatasetFORMAT.......................................................................................................82
6.14.4 DatasetACCESS..........................................................................................................83
6.14.5 DatasetSECURITY......................................................................................................83
6.14.6 EthicsandLegalrequirements...................................................................................83
6.15 BBDataset-ProductsandCategories:ProductAttributes................................................84
6.15.1 DatasetIDENTIFICATION...........................................................................................84
6.15.2 DatasetORIGIN..........................................................................................................84
6.15.3 DatasetFORMAT.......................................................................................................85
6.15.4 DatasetACCESS..........................................................................................................85
6.15.5 DatasetSECURITY......................................................................................................86
6.15.6 EthicsandLegalrequirements...................................................................................86
6.16 CEDataset-Marketdata:Productspricehistory..............................................................86
6.16.1 DatasetIDENTIFICATION...........................................................................................86
6.16.2 DatasetORIGIN..........................................................................................................87
6.16.3 DatasetFORMAT.......................................................................................................87
6.16.4 DatasetACCESS..........................................................................................................88
6.16.5 DatasetSECURITY......................................................................................................88
6.16.6 EthicsandLegalrequirements...................................................................................89
6.17 MEDataset-ConsumerData:Salesdata...........................................................................89
6.17.1 DatasetIDENTIFICATION...........................................................................................89
6.17.2 DatasetORIGIN..........................................................................................................89
6.17.3 DatasetFORMAT.......................................................................................................90
6.17.4 DatasetACCESS..........................................................................................................90
6.17.5 DatasetSECURITY......................................................................................................91
6.17.6 EthicsandLegalrequirements...................................................................................91
6.18 JOTDataset-Consumerdata:Trafficsource(Bing)...........................................................91
6.18.1 DatasetIDENTIFICATION...........................................................................................91
6.18.2 DatasetORIGIN..........................................................................................................92
6.18.3 DatasetFORMAT.......................................................................................................92
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 10
6.18.4 DatasetACCESS..........................................................................................................93
6.18.5 DatasetSECURITY......................................................................................................94
6.18.6 EthicsandLegalrequirements...................................................................................94
6.19 JOTDataset-Consumerdata:Trafficsource(Google)......................................................94
6.19.1 DatasetIDENTIFICATION...........................................................................................94
6.19.2 DatasetORIGIN..........................................................................................................95
6.19.3 DatasetFORMAT.......................................................................................................95
6.19.4 DatasetACCESS..........................................................................................................96
6.19.5 DatasetSECURITY......................................................................................................97
6.19.6 EthicsandLegalrequirements...................................................................................97
6.20 JOTDataset-Marketdata:Twittertrends.........................................................................98
6.20.1 DatasetIDENTIFICATION...........................................................................................98
6.20.2 DatasetORIGIN..........................................................................................................98
6.20.3 DatasetFORMAT.......................................................................................................98
6.20.4 DatasetACCESS..........................................................................................................99
6.20.5 DatasetSECURITY....................................................................................................100
6.20.6 EthicsandLegalrequirements.................................................................................100
6.21 LODDataset-Geographic:DBpedia.................................................................................100
6.21.1 DatasetIDENTIFICATION.........................................................................................100
6.21.2 DatasetORIGIN........................................................................................................101
6.21.3 DatasetFORMAT.....................................................................................................101
6.21.4 DatasetACCESS........................................................................................................102
6.21.5 DatasetSECURITY....................................................................................................102
6.21.6 EthicsandLegalrequirements.................................................................................103
6.22 LODDataset-Geographic:LinkedOpenStreetMaps......................................................103
6.22.1 DatasetIDENTIFICATION.........................................................................................103
6.22.2 DatasetORIGIN........................................................................................................103
6.22.3 DatasetFORMAT.....................................................................................................104
6.22.4 DatasetACCESS........................................................................................................105
6.22.5 DatasetSECURITY....................................................................................................105
6.22.6 EthicsandLegalrequirements.................................................................................106
6.23 LODDataset-Geographic:LinkedGeoData....................................................................106
6.23.1 DatasetIDENTIFICATION.........................................................................................106
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 11
6.23.2 DatasetORIGIN........................................................................................................106
6.23.3 DatasetFORMAT.....................................................................................................107
6.23.4 DatasetACCESS........................................................................................................107
6.23.5 DatasetSECURITY....................................................................................................108
6.23.6 EthicsandLegalrequirements.................................................................................108
6.24 LODDataset-Geographic:GeoNames............................................................................108
6.24.1 DatasetIDENTIFICATION.........................................................................................108
6.24.2 DatasetORIGIN........................................................................................................109
6.24.3 DatasetFORMAT.....................................................................................................109
6.24.4 DatasetACCESS........................................................................................................110
6.24.5 DatasetSECURITY....................................................................................................110
6.24.6 EthicsandLegalrequirements.................................................................................111
6.25 MappingbetweenDatasetandBusinesscase.................................................................111
Chapter7 StorageandRe-use......................................................................................................112
7.1 Storage..................................................................................................................................112
7.2 BackupandRecovery............................................................................................................112
7.3 DataArchiving......................................................................................................................112
7.4 Security.................................................................................................................................113
7.5 Permission............................................................................................................................113
7.6 Access,Re-useandLicensing................................................................................................113
AnnexA–DMPSurvey......................................................................................................................115
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 12
ListofTables
TABLE1.ABBREVIATIONSANDACRONYMS.....................................................................................................................16TABLE2.SHORTREFERENCESFORPROJECTPARTNERS.......................................................................................................19TABLE3.ROLESANDRESPONSIBILITIESOFBENEFICIARIES..................................................................................................23TABLE4CORECONCEPTS-EUROPEANDATAPROTECTIONLEGISLATION...............................................................................26TABLE5.DATASETIDENTIFICATION–PURCHASEINTENT...........................................................................................44TABLE6.DATASETORIGIN–PURCHASEINTENT..........................................................................................................44TABLE7DATASETFORMAT–PURCHASEINTENT........................................................................................................45TABLE8MAKINGDATAACCESSIBLE–PURCHASEINTENT...........................................................................................46TABLE9MAKINGDATAINTEROPERABLE–PURCHASEINTENT...................................................................................46TABLE10DATASETSECURITY-PURCHASEINTENT......................................................................................................47TABLE11.DATASETIDENTIFICATION–LOCATIONANALYTICSDATA..............................................................................47TABLE12.DATASETORIGIN–LOCATIONANALYTICSDATA............................................................................................48TABLE13DATASETFORMAT–LOCATIONANALYTICSDATA...........................................................................................48TABLE14MAKINGDATAACCESSIBLE–LOCATIONANALYTICSDATA.............................................................................48TABLE15MAKINGDATAINTEROPERABLE–LOCATIONANALYTICSDATA......................................................................49TABLE16DATASETSECURITY-LOCATIONANALYTICSDATA..........................................................................................49TABLE17.DATASETIDENTIFICATION–LOCATIONANALYTICSDATA..............................................................................50TABLE18.DATASETORIGIN–LOCATIONANALYTICSDATA............................................................................................50TABLE19DATASETFORMAT–LOCATIONANALYTICSDATA...........................................................................................51TABLE20MAKINGDATAACCESSIBLE–LOCATIONANALYTICSDATA.............................................................................51TABLE21MAKINGDATAINTEROPERABLE–LOCATIONANALYTICSDATA......................................................................52TABLE22DATASETSECURITY-LOCATIONANALYTICSDATA..........................................................................................52TABLE23.DATASETIDENTIFICATION–CUSTOMERPURCHASEHISTORY.......................................................................53TABLE24DATASETORIGIN–CUSTOMERPURCHASEHISTORY.......................................................................................53TABLE25DATASETFORMAT–CUSTOMERPURCHASEHISTORY....................................................................................54TABLE26MAKINGDATAACCESSIBLE–CUSTOMERPURCHASEHISTORY.......................................................................54TABLE27MAKINGDATAINTEROPERABLE–CUSTOMERPURCHASEHISTORY...............................................................54TABLE28DATASETSECURITY–CUSTOMERPURCHASEHISTORY...................................................................................55TABLE29.DATASETIDENTIFICATION–CONSUMERINTENTANDINTERACTION...............................................................56TABLE30DATASETORIGIN-CONSUMERINTENTANDINTERACTION...............................................................................56TABLE31DATASETFORMAT–CONSUMERINTENTANDINTERACTION............................................................................56TABLE32MAKINGDATAACCESSIBLE–CONSUMERINTENTANDINTERACTION..............................................................57TABLE33MAKINGDATAINTEROPERABLE–CONSUMERINTENTANDINTERACTION.......................................................57TABLE34DATASETSECURITY–CONSUMERINTENTANDINTERACTION..........................................................................57TABLE35.DATASETIDENTIFICATION–LOCATIONANALYTICSDATA..............................................................................58TABLE36.DATASETORIGIN–LOCATIONANALYTICSDATA..............................................................................................59TABLE37DATASETFORMAT–LOCATIONANALYTICSDATA...........................................................................................59TABLE38MAKINGDATAACCESSIBLE–LOCATIONANALYTICSDATA.............................................................................59TABLE39MAKINGDATAINTEROPERABLE–LOCATIONANALYTICSDATA......................................................................60
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 13
TABLE40DATASETSECURITY-LOCATIONANALYTICSDATA..........................................................................................60TABLE41.DATASETIDENTIFICATION–CONTACTANDCONSUMERINTERACTIONHISTORY...............................................61TABLE42DATASETORIGIN–CONTACTANDCONSUMERINTERACTIONHISTORY...............................................................62TABLE43DATASETFORMAT–CONTACTANDCONSUMERINTERACTIONHISTORY............................................................62TABLE44MAKINGDATAACCESSIBLE–CONTACTANDCONSUMERINTERACTIONHISTORY...............................................64TABLE45MAKINGDATAINTEROPERABLE–CONTACTANDCONSUMERINTERACTIONHISTORY.......................................64TABLE46DATASETSECURITY–CONTACTANDCONSUMERINTERACTIONHISTORY...........................................................64TABLE47.DATASETIDENTIFICATION–MARSHISTORICALDATA................................................................................65TABLE48DATASETORIGIN–MARSHISTORICALDATA...............................................................................................65TABLE49DATASETFORMAT–MARSHISTORICALDATA............................................................................................66TABLE50MAKINGDATAACCESSIBLE–MARSHISTORICALDATA...............................................................................66TABLE51MAKINGDATAINTEROPERABLE–MARSHISTORICALDATA.......................................................................66TABLE52DATASETSECURITY–MARSHISTORICALDATA...........................................................................................67TABLE53.DATASETIDENTIFICATION–PRODUCTATTRIBUTES....................................................................................67TABLE54DATASETORIGIN-PRODUCTATTRIBUTES....................................................................................................68TABLE55DATASETFORMAT–PRODUCTATTRIBUTES.................................................................................................68TABLE56MAKINGDATAACCESSIBLE–PRODUCTATTRIBUTES...................................................................................69TABLE57MAKINGDATAINTEROPERABLE–PRODUCTATTRIBUTES............................................................................69TABLE58DATASETSECURITY–PRODUCTATTRIBUTES................................................................................................69TABLE59.DATASETIDENTIFICATION–EVENTREGISTRY............................................................................................70TABLE60DATASETORIGIN–EVENTREGISTRY...........................................................................................................71TABLE61DATASETFORMAT–EVENTREGISTRY.........................................................................................................71TABLE62MAKINGDATAACCESSIBLE–EVENTREGISTRY...........................................................................................71TABLE63MAKINGDATAINTEROPERABLE–EVENTREGISTRY....................................................................................72TABLE64DATASETSECURITY–EVENTREGISTRY.......................................................................................................72TABLE65.DATASETIDENTIFICATION–CONSUMERDATA..........................................................................................73TABLE66DATASETORIGIN–CONSUMERDATA..........................................................................................................73TABLE67DATASETFORMAT–CONSUMERDATA.......................................................................................................74TABLE68MAKINGDATAACCESSIBLE–CONSUMERDATA..........................................................................................74TABLE69MAKINGDATAINTEROPERABLE–CONSUMERDATA..................................................................................75TABLE70DATASETSECURITY–CONSUMERDATA......................................................................................................75TABLE71.DATASETIDENTIFICATION–SALESDATA.....................................................................................................76TABLE72DATASETORIGIN–SALESDATA.................................................................................................................76TABLE73DATASETFORMAT–SALESDATA...............................................................................................................76TABLE74MAKINGDATAACCESSIBLE–SALESDATA..................................................................................................77TABLE75MAKINGDATAINTEROPERABLE–SALESDATA..........................................................................................77TABLE76DATASETSECURITY–SALESDATA..............................................................................................................77TABLE77.DATASETIDENTIFICATION–PRODUCTATTRIBUTES....................................................................................78TABLE78DATASETORIGIN–PRODUCTATTRIBUTES....................................................................................................79TABLE79DATASETFORMAT–PRODUCTATTRIBUTES.................................................................................................79TABLE80MAKINGDATAACCESSIBLE–PRODUCTATTRIBUTES....................................................................................80TABLE81MAKINGDATAINTEROPERABLE–PRODUCTATTRIBUTES............................................................................81TABLE82DATASETSECURITY–PRODUCTATTRIBUTES................................................................................................81TABLE83.DATASETIDENTIFICATION–DOORCOUNTERDATA....................................................................................81TABLE84DATASETORIGIN–DOORCOUNTERDATA....................................................................................................82TABLE85DATASETFORMAT–DOORCOUNTERDATA.................................................................................................82TABLE86MAKINGDATAACCESSIBLE–DOORCOUNTERDATA....................................................................................83TABLE87MAKINGDATAINTEROPERABLE–DOORCOUNTERDATA............................................................................83
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 14
TABLE88DATASETSECURITY–DOORCOUNTERDATA................................................................................................83TABLE89.DATASETIDENTIFICATION–PRODUCTATTRIBUTES....................................................................................84TABLE90DATASETORIGIN–PRODUCTATTRIBUTES...................................................................................................84TABLE91DATASETFORMAT–PRODUCTATTRIBUTES.................................................................................................85TABLE92MAKINGDATAACCESSIBLE–PRODUCTATTRIBUTES...................................................................................85TABLE93MAKINGDATAINTEROPERABLE–PRODUCTATTRIBUTES............................................................................85TABLE94DATASETSECURITY–PRODUCTATTRIBUTES................................................................................................86TABLE95.DATASETIDENTIFICATION–PRODUCTSPRICEHISTORY...............................................................................86TABLE96DATASETORIGIN-PRODUCTSPRICEHISTORY................................................................................................87TABLE97DATASETFORMAT–PRODUCTSPRICEHISTORY............................................................................................87TABLE98MAKINGDATAACCESSIBLE–PRODUCTSPRICEHISTORY...............................................................................88TABLE99MAKINGDATAINTEROPERABLE–PRODUCTSPRICEHISTORY.......................................................................88TABLE100DATASETSECURITY–PRODUCTSPRICEHISTORY.........................................................................................88TABLE101.DATASETIDENTIFICATION–SALESDATA................................................................................................89TABLE102DATASETORIGIN-SALESDATA................................................................................................................89TABLE103DATASETFORMAT–SALESDATA.............................................................................................................90TABLE104MAKINGDATAACCESSIBLE–SALESDATA................................................................................................90TABLE105MAKINGDATAINTEROPERABLE–SALESDATA........................................................................................90TABLE106DATASETSECURITY–SALESDATA............................................................................................................91TABLE107DATASETIDENTIFICATION–TRAFFICSOURCE(BING).................................................................................91TABLE108DATASETORIGIN-TRAFFICSOURCE(BING)................................................................................................92TABLE109DATASETFORMAT–TRAFFICSOURCE(BING).............................................................................................92TABLE110MAKINGDATAACCESSIBLE–TRAFFICSOURCE(BING)...............................................................................93TABLE111MAKINGDATAINTEROPERABLE–TRAFFICSOURCE(BING)........................................................................93TABLE112DATASETSECURITY–TRAFFICSOURCE(BING)............................................................................................94TABLE113DATASETIDENTIFICATION–TRAFFICSOURCE(GOOGLE)............................................................................94TABLE114DATASETORIGIN-TRAFFICSOURCE(GOOGLE)...........................................................................................95TABLE115DATASETFORMAT–TRAFFICSOURCE(GOOGLE)........................................................................................95TABLE116MAKINGDATAACCESSIBLE–TRAFFICSOURCE(GOOGLE)...........................................................................96TABLE117MAKINGDATAINTEROPERABLE–TRAFFICSOURCE(GOOGLE)...................................................................97TABLE118DATASETSECURITY–TRAFFICSOURCE(GOOGLE)........................................................................................97TABLE119DATASETIDENTIFICATION–TWITTERTRENDS..........................................................................................98TABLE120DATASETORIGIN–TWITTERTRENDS.........................................................................................................98TABLE121DATASETFORMAT–TWITTERTRENDS......................................................................................................98TABLE122MAKINGDATAACCESSIBLE–TWITTERTRENDS.........................................................................................99TABLE123MAKINGDATAINTEROPERABLE–TWITTERTRENDS..................................................................................99TABLE124DATASETSECURITY–TWITTERTRENDS....................................................................................................100TABLE125.DATASETIDENTIFICATION–DBPEDIA..................................................................................................100TABLE126DATASETORIGIN–DBPEDIA.................................................................................................................101TABLE127DATASETFORMAT–DBPEDIA...............................................................................................................101TABLE128MAKINGDATAACCESSIBLE–DBPEDIA.................................................................................................102TABLE129MAKINGDATAINTEROPERABLE–DBPEDIA..........................................................................................102TABLE130DATASETSECURITY–DBPEDIA.............................................................................................................102TABLE131.DATASETIDENTIFICATION–LINKEDOPENSTREETMAPS........................................................................103TABLE132DATASETORIGIN–LINKEDOPENSTREETMAPS........................................................................................103TABLE133DATASETFORMAT–LINKEDOPENSTREETMAPS.....................................................................................104TABLE134MAKINGDATAACCESSIBLE–LINKEDOPENSTREETMAPS........................................................................105TABLE135MAKINGDATAINTEROPERABLE–LINKEDOPENSTREETMAPS................................................................105
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 15
TABLE136DATASETSECURITY–LINKEDOPENSTREETMAPS....................................................................................105TABLE137.DATASETIDENTIFICATION–LINKEDGEODATA.....................................................................................106TABLE138DATASETORIGIN–LINKEDGEODATA....................................................................................................106TABLE139DATASETFORMAT–LINKEDGEODATA..................................................................................................107TABLE140MAKINGDATAACCESSIBLE–LINKEDGEODATA.....................................................................................107TABLE141MAKINGDATAINTEROPERABLE–LINKEDGEODATA.............................................................................107TABLE142DATASETSECURITY–LINKEDGEODATA.................................................................................................108TABLE143.DATASETIDENTIFICATION–GEONAMES..............................................................................................108TABLE144DATASETORIGIN–GEONAMES.............................................................................................................109TABLE145DATASETFORMAT–GEONAMES...........................................................................................................109TABLE146MAKINGDATAACCESSIBLE–GEONAMES.............................................................................................110TABLE147MAKINGDATAINTEROPERABLE–GEONAMES......................................................................................110TABLE148DATASETSECURITY–GEONAMES.........................................................................................................110TABLE149MAPPINGDATASETANDBUSINESSCASE......................................................................................................111
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 16
Acronymstable
AbbreviationsandacronymsusedinthisdeliverablearereportedinTable1.
Table1.Abbreviationsandacronyms
DMP DataManagementPlan
DPA DataProtectionAuthority
DPIA DataProtection/PrivacyImpactAssessments
EAN EuropeanArticleNumber
EC EuropeanCommission
EU EuropeanUnion
GDPR GeneralDataProtectionRegulation
GPC GlobalProductClassification
GS1 GlobalStandardsOne
IPR IntellectualPropertyRights
LOD LinkedOpenData
PbD PrivacybyDesign
PD PersonalData
RDB Relationaldatabase
ROM RoughOrderofMagnitude
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 17
Ab
Chapter1 Introduction
According to the Guidelines on FAIR DataManagement in Horizon 2020, DataManagement Plan(DMP)isakeyelementofgooddatamanagement.ADMPdescribesthedatamanagementlifecycleforthedatatobecollected,processedand/orgeneratedbyaHorizon2020project.
Thisdocumentwill set-upaDMP inaccordancewithH2020Guidelines, including informationandsuggestionsabout thehandlingofdataduringandafter theendof theproject,whatdatawillbecollected,processedand/orgenerated,whichmethodologyandstandardswillbeapplied,whetherdatawillbeshared/madeopenaccessandhowdatawillbecuratedandpreserved(includingaftertheendoftheproject).
InadditiontotheguidelinesprovidedbytheEuropeanCommission,thisdocumentalsoreferstotheplantoaddressthelegalandethicalissuesrelatedtodatathatwillbecollected.
The deliverable describes the approach established in EW-Shopp to ensure the life-cyclemanagement of the public and proprietary datasets provided by the consortiummembers to theproject as well as other dataset produced by the Consortium during the project execution, asdefinedatM6.
In chapter 1 the document defines which are the principles underlying EW-Shopp DMP, theapproach followed togenerate thestructure, themaincontentsof thedocumentand links to theotherdeliverablesanddocuments.Inchapter2,thedocumentintroducestheEW-Shoppproject,itspurpose, thekindofdataset involved in theproject, theaudienceand the responsibilitiesdefinedaround the DMP. Chapter 3 introduces core concepts and fundamental legal principles as welloutlinesanethicalassessmentfordataownerand,concerninglegalrequirements,providesdetailedguidelinesabout theobligations thatdataownersneed to complywith. InChapter4, ahigh-leveldescription of the four business cases is reported in order to give an overall view of the projectscope. In Chapter 5, relevant information regarding the dataset are explained and the process tocollectall the informationamongdataowners isdescribed.Chapter6shows, foreachdataset,alltheinformationrequiredfordatasetidentification,origin,format,access,securityandwithrespectto ethical and legal requirements. Data storage policies, data archiving, security, permission, dataaccess,re-useandlicensingarediscussedinchapter7.
Finally,thesurveythatwassubmittedtoalldatasetprovidersisreportedinAnnexA.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 18
1.1 PrinciplesunderlyingEW-ShoppDMP
TheEW-Shoppprojectaimsatdeployingandhostingaplatformtoeasedata integrationtasks,byembedding shared datamodels, robust datamanagement techniques and semantic reconciliationmethods. Thisplatformwill offer a framework forunificationof fragmentedbusinessdataand itsintegrationwith external event andweather data, whichwill support data analytics services thatofferkeycompetitiveadvantagesinthemoderncommercespace.Ingeneral,researchdatashouldbe 'FAIR',that is findable,accessible, interoperableandre-usable.These principles precede implementation choices and do not necessarily suggest any specifictechnology,standard,orimplementation-solution.Inthiscontext,theDataManagementPlanisakeyactivityanditwilldeepenthegeneralprinciplesunderlyingEW-ShoppDataManagementPlan(from[DoA]):
• EW-ShoppPrivacyPolicy:WewillsetupandexplicitlydefineaPrivacyPolicyadoptedintheEW-Shoppproject,withwhichallpartnersanddataprocessingactivitiescarriedout intheprojectmust comply. […] In case some PD is used in some intermediate data processingstep,thisinformationwillbeproperlyanonymizedandusedonlyuponconsenttosecondaryusecollectedfromtheusers.TheEW-ShoppPrivacyPolicywillassurethatdataprocessingactivities in EW-Shopp comply with national and EU legislation, including legislation onpersonaldataprotection.
• Statistical data not containing PD: Themajority of datasets consist of statistical data (alldatasetclassifiedasnotcontainingpersonaldatainthedatadescriptiontables).ThesedatadonotcontainPDbutonlyinformationtreatedatanaggregatelevelthatcannotbelinkedback to single individuals. Therefore, the specific data subjects will be not visible/recognizable in such sets of data. These data havebeen collectedby business partners intheir daily operations in compliance with national regulations, both in relation to privacyprotectionandinformedconsenttodataprocessing.
• AnonymizationofdatacontainingPD:Otherdatasetsareclassifiedascontainingpersonaldatainthedatadescriptiontables.Thesedatawillbeanonymizedbeforebeingusedintheproject soas tocomplywith theprivacyprotectionpolicyandnationalandEU legislation.Amongthesedatasets,weconsiderthreenotablecases,forwhichwespecifyhowweplantoensureprivacyprotectionconstraints.
1.2 GeneralApproach
TheEW-ShoppDMPwillbedevelopedbytaking intoaccounttheDMPtemplatethatmatchesthedemands and suggestions of the Guidelines on Data Management in Horizon 2020, and that isavailablethroughtheDMPonlineplatform2.
Theprincipalcontentsindicatedinthetemplateareenlistedherebelow:
2https://dmponline.dcc.ac.uk/
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 19
• DatasetDescription• Fairdata(makingdatafindable,accessible,interoperableandreusable)• Datasecurity• Dataarchivingandpreservation• Ethicsandaspects
Thesecontentswereutilizedasaguideandthenthedocumentwascustomizedaccordingtospecificstudyrequirements.
1.3 Applicabledocumentsandreferences
The following documents are applicable to the subject discussed in this deliverable, and will bereferencedasindicatedintoroundbrackets:
1. EW-Shopp–GrantAgreementnumber732590([GA])
2. [GA]Annex1–DescriptionoftheAction([DoA])
3. EW-Shopp–ConsortiumAgreement([CA])
4. D7.2POPD-RequirementNo.2([D7.2])
Short references may be used to refer to project beneficiaries, also frequently referred to aspartners.ReferencesarelistedinTable2.
Table2.Shortreferencesforprojectpartners
No. Beneficiary(partner)nameasin[GA] Shortreference
1 UNIVERSITA’DEGLISTUDIDIMILANO-BICOCCA UNIMIB
2 CENEJEDRUZBAZATRGOVINOINPOSLOVNOSVETOVANJEDOO CE
3 BROWSETEL(UK)LIMITED BT
4 GFKEURISKOSRL. GFK
5 BIGBANG,TRGOVINAINSTORITVE,DOO BB
6 MEASURENCELIMITED ME
7 JOTINTERNETMEDIAESPAÑASL JOT
8 ENGINEERING–INGEGNERIAINFORMATICASPA ENG
9 STIFTELSENSINTEF SINTEF
10 INSTITUTJOZEFSTEFAN JSI
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 20
1.4 Updatesofthisdeliverable
Thisdeliverablewillbeupdated,overthecourseoftheproject,wheneversignificantchangesarise,toensurecompliancewithHorizon2020guidelines.Amongthesechangesitispossibletolist:newdatasets thatwillbeadded, changes in consortiumpoliciesor changes in consortiumcompositionandexternalfactors.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 21
Chapter2 ProjectDataManagement
2.1 Projectpurposes
EW-Shoppaimsat supporting companiesoperating in the fragmentedEuropeanecosystemof theeCommerce, Retail and Marketing industries to increase their efficiency and competitiveness byleveragingdeepcustomerinsightsthataretoochallengingforthemtoobtaintoday.
Improved insights will result from the analysis of large amount of data, acquired from differentsources and sectors, and in multiple languages. The integration of consumer and market datacollected by different business partners will ensure to cover customer interactions and activitiesacrossdifferentchannels,providinginsightsonrichcustomerjourneys.Theseintegrateddatawillbefurther enriched with information about weather and events, two crucial factors impactingconsumerchoices.
Byincreasingtheanalyticalpowercomingfromtheintegrationofcross-sectorialandcross-languagedata sourcesandnewdata sources companieswilldeploy real-time responsive services fordigitalmarketing, reporting-style services formarket research,advanceddataand resourcemanagementservicesforRetail&eCommercecompaniesandtheirtechnologyproviders,andenhancedlocationintelligenceservices.Forexample,byusingapredictivemodelbuiltontopofintegrateddataaboutclick-throughrateofproducts,weatherandevents,wewilldevelopaservicethatisabletoincreaseadvertisingoftop-gearsportequipmentonasunnyweekendafternoonduringTourDeFrance.
To realize these objectives, a platform, also referred to as EW-Shopp platform, will be built. Theplatformwillsupport:
• The integration of consumer and market data, covering customer interactions acrossdifferent channels and with different languages, and providing insights on rich customerjourney
• Theenrichmentoftheintegrateddatawithinformationaboutweatherandevents
• Theanalysisoftheenricheddatausingvisual,descriptiveandpredictiveanalytics.
2.2 Projectdata
EW-Shoppmakesuseofamixofpublicandproprietarydatasets.Thebroadclassesofdataincludethefollowing:
• Marketdata–dataextractedfrommarketingresearchandcommercialactivity• Consumerdata–profilesfrommarketingresearch,e-commerce,digitaladvertising,andIoT
devices• Category/productdata–datacomingfromcommercialactivities
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 22
• Eventsreportedinmedia–popularonlinemediadata• Weatherdataandforecasts
TheEW-Shoppplatformwillprovidedataservicesandtools toprocessandharmonisedata. Itwillproduceasetofagreeddatamodels,includingasharedsystemofentityidentifierstorepresenttheaforementioneddatasets.Thedatawillfurthermoreberepresentedinawaythatprovidessupportformultipleinputlanguages.
2.3 Audience
Projectdataareorientedto:
• Theconsortiumpartners;• Allstakeholdersinvolvedintheproject;• TheEuropeanCommission.
Because of the sensitiveness of business data used in the EW-Shopp innovation action, nocommitmenttopublishdatasetsprovidedbybusinesspartnersasopendata ismadein [DoA].Forthisreason,wedonotincludeexternalstakeholdersintheaudienceforprojectdata.Withexternalstakeholderswerefertoapartythat:isnotabeneficiary,isnotalinkedthirdpartyinEW-Shopp,isnot the EuropeanCommission.Althoughwedonot expect tomakedatasets openly accessible toexternal stakeholders, models and methodologies developed in the project to supportinteroperabilitybetweendifferentpartieswillbedisseminatedtoalargeraudienceofstakeholders.
2.4 Rolesandresponsibilities
WedescribemainrolesofbeneficiariesintheconsortiumandtheirresponsibilitieswithregardstodataandservicesdevelopedinbusinesscasesinTable3.RolesandResponsibilitiesofBeneficiariesInthetablewithrefertoBusinessCaseswiththeirnumber,whicharefurtherexplainedinChapter4.
Inthetable,wedistinguishbetweentwomainrolesofbeneficiariesintheconsortium:- Business Partners: partners that develop services within the project, by exploiting the
technology developed in the project, i.e., the EW-Shopp platform, on their own data setsand/orwiththehelpofdatasetsprovidedbyotherpartnersintheproject.Thesepartnerswill also contribute indirectly to the technology by driving its development with thespecificationcomingfromtheirbusinesscases.
- Technologypartners:partnerswhosemainroleintheprojectistodevelopthetechnologythatwill support the EW-Shopp platform. These partnerswill also contribute indirectly tothebusinesscasesbyperformingthefollowingactivities:
o Providingorsupportingaccesstocoredatasets,i.e.,datasetssuchasproductdata,locations,weatherandevents,usedtointegrateandenrichbusinessdata.
o Supporting the development of pilots and services by helping business partnersintegrateoranalyzethedata.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 23
Table3.RolesandResponsibilitiesofBeneficiaries
Partner PartnerRole Resp.wrtData Resp.wrtBusinessCases
Business Tech. Owner Facilitator Service Data Tech.Support(Integration)
Tech.Support(Analytics)
UNIMIB X X BC2,BC3
CE X X BC1 BC1
BT X X X BC1 BC1
GFK X X X BC2 BC1,BC2
BB X X BC1 BC1
ME X X BC3 BC3
JOT X X BC4 BC4
ENG X BC4 BCALL
SINTEF X BC1
JSI X X X BCALL
Atagenerallevel,responsibilitieswithrespecttodatamanagedintheprojectcanbesummarizedasfollows:
- Dataowner,apartnerthatprovidestotheconsortiumdatathatitowns- Datafacilitator,apartnerthateasesaccesstodatathatare:
o providedbybeneficiaries(i.e.,UNIMIBwillsupportaccesstoproductdataownedbyGFK)
o providedbylinkedthirdparties(i.e.,JSIwillprovideaccesstoweatherdataprovidedbyECMWF)
o available as open data (i.e., UNIMIB will provide access to relevant data aboutlocationsavailableinsourcessuchasDBpedia3)
Partnersmaythushavedifferentresponsibilitieswithrespecttodevelopmentofbusinesscasesandpilots (see Table 3 for the specification of the responsibilities of individual beneficiaries in eachbusinesscase):
- Servicedeveloper(referredtoas“Service” inthetable) isabeneficiarythat isresponsiblefordevelopingaservicewithinabusinesscase.
- Data provider (referred to as “Data” in the table) is a beneficiary that is responsible forprovidingitsdatatosupportabusinesscase.
- Technical support (integration) is a technical partner that is responsible for providingsupportinabusinesscasebyhelpingbusinesspartnersinthedataintegrationprocess.
3dbpedia.org
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 24
- Technicalsupport(analytics)isatechnicalpartnerthatisresponsibleforprovidingsupportinabusinesscasebyhelpingbusinesspartnersinthedataanalyticprocess.
Theassignmentofbusinesscasestotechnologypartnersmaybesubjecttochangeinthecourseoftheproject;Table3reportsassignmentsthathavebeenusedtocollectrequirementsincludedinthisdocument.InadditiontoEW-Shoppbeneficiaries,theprojectalsoincludethreetwopartieshavingaroleintheproject:
- European Centre for Medium-Range Weather Forecasts (ECMWF) is an independentintergovernmental organisation founded in 1975 and supported by 34 states(http://www.ecmwf.int). Data from ECMWF are provided to the EW-Shopp project to beused by every partner. ECMWFwill contribute in EW-Shopp bymaking available, for thescopeoftheproject,itsmeteorologicalarchiveofforecasts(MARS)ofthepast35yearsandsetsofreanalysisforecasts.
- CDE is a Slovene Ltd IT company providing IT solutions for communication and customerrelation management linked to Browsetel (BT). CDE will act as a data and infrastructureprovider and software development in the context of BC1 inWP4,while BTwill focus onbusinessdevelopment.ResponsibilitiesofCDE inEW-Shoppare included inresponsibilitiesofBTinTable3.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 25
Chapter3 EthicsandLegalCompliance
3.1 Legalrequirementsregardingpersonaldata
TheEW-ShoppprojectmustcomplywithallEUlawsregardingdataprotection.Thepurposeofthissection is to explain core principles and concepts of the right to protection of personal data inscientificresearch.4
In the1990s, theEuropeanUnion startedaprocessof codificationofdataprotectionandprivacyrights in order to harmonise different national legislation. Directive 95/46/EC5 (“Data ProtectionDirective”) and Directive 2002/58/EC6 (“E-Privacy Directive”) are the main legal provisions thatreferredtodefinethelegalframework,consideringalsotheEUCharterofFundamentalRights7andtheappropriatenationallegislationthattransposedtheseEUdirectives.
This multilevel legal environment is going to change in 2018, when in May a new EuropeanRegulation comes into force.8 Indeed, theGeneral Data Protection Regulation (GDPR) (Regulation(EU)2016/679)9wasapproved,bytheEUParliament,on14April2016.Itwillenterinforce20daysafter itspublication in theEUOfficial Journal andwill bedirectly application inallmember statestwoyearsafterthisdate.ItisdesignedtoharmonizedataprivacylawsacrossEurope,toprotectandempower all EU citizens' data privacy and to reshape the way organizations across the regionapproachdataprivacy.
AlthoughthenewRegulationconfirmsthemainprinciplesofboththeabove-citedDirectives,itwillsubstitutethemandallnationallegislationondataprotectionandprivacyrights.
3.1.1 Coreconcepts
4According toarticle19Regulation(EU)n.1291/2013 (Horizon2020): “all the researchand innovationactivities carriedunderHorizon2020shallcomplywithethicalprinciplesandrelevantnational,Unionandinternationallegislation,includingthe Charter of Fundamental Rights of the European Union and the European Convention on Human Rights and itsSupplementaryProtocols.Particularattentionshallbepaidtotheprincipleofproportionality,therighttoprivacy,therightto the protection of personal data, the right to the physical and mental integrity of a person, the right to non-discriminationandtheneedtoensurehighlevelsofhumanhealthprotection.”5Directive95/46/ECof theEuropeanParliamentandof theCouncilof24October1995ontheprotectionof individualswithregardtotheprocessingofpersonaldataandonthefreemovementofsuchdata.6 Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing ofpersonaldataandtheprotectionofprivacy intheelectroniccommunicationssector (DirectiveonPrivacyandElectronicCommunications). Later thisDirectivewasamendedwithDirective2009/136/ECof theEuropeanParliamentandof theCouncilof25November2009.7 Article 8 (Protection of Personal Data) of the EU Charter of Fundamental Rights: “1. Everyone has the right to theprotectionofpersonaldataconcerninghimorher.2.Suchdatamustbeprocessedfairlyforspecifiedpurposesandonthebasisoftheconsentofthepersonconcernedorsomeotherlegitimatebasislaiddownbylaw.Everyonehastherightofaccesstodatawhichhasbeencollectedconcerninghimorher,andtherighttohaveitrectified.3.Compliancewiththeserulesshallbesubjecttocontrolbyanindependentauthority.”8Regulation(EU)2016/679oftheEuropeanParliamentandof theCouncilof27April2016ontheprotectionofnaturalpersonswithregardtotheprocessingofpersonaldataandonthefreemovementofsuchdata,andrepealingDirective95/46/EC(GeneralDataProtectionRegulation).9http://www.eugdpr.org/
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 26
EuropeanDataProtectionlegislationisbasedonsomecoreconceptsconcerningthesubjectswhoaregoingtoacquire,collect,process,profile,andusedata;thedifferenttypesofdata;andnotificationprocedures. Beloware listed themost important definitions for scientific research activities. Thesedefinitions have been extrapolated from EU legislation, EU and Member State (MS) officialdocuments,orotherlegaldocuments.
Alltextinitalicsiswithrespecttothenew2018Europeanregulationanditsadditionalrequirements.Table4Coreconcepts-EuropeanDataProtectionlegislation
CORECONCEPT Definition
SUBJECTSINDATAPROCESS DataController10:Thenaturalor legalperson,whichaloneor jointlywithothersdeterminesthepurposesandmeansoftheprocessingofpersonaldata.
DataProcessor11:Anaturalorlegalperson,whichprocessespersonaldataonbehalfofthecontroller.
DIFFERENTTYPESOFDATA Personal Data12: Any information relating to an identified oridentifiable natural person (“data subject”); an identifiable person isone who can be identified, directly or indirectly, in particular, byreference to an identification number, location data, an onlineidentifier or to one or more factors specific to his physical,physiological,genetic,mental,economic,culturalorsocial identityofthatnaturalperson.Personaldatamaybeprocessedonly ifthedatasubjecthasunambiguouslygivenhisconsent(“priorconsent”).
NB:Anonymiseddataarenolongerpersonaldata.Seebelow.
Sensitive (Personal) Data13: Personal data revealing racial or ethnicorigin,politicalopinions,religiousorphilosophicalbeliefs,trade-unionmembership, and the processing ofgenetic data, biometric data forthepurposeofuniquely identifyinganaturalperson,dataconcerninghealth or data concerning a natural person’s sex life or sexualorientation.Sensitivedatamaybeprocessedonly if thedatasubjecthasgivenhisexplicit consent to theprocessingof thosedata (“priorwrittenconsent”).
NB:Anonymiseddataarenolongerpersonaldata.Seebelow.
Genetic Data14: personal data relating to the inherited or acquiredgenetic characteristics of a natural person which give unique
10Art.2,lett.d),Directive95/46/ECandart.4,n.7),Regulation(EU)2016/679.11Art.2,lett.e),Directive95/46/ECandart.4,n.8),Regulation(EU)2016/679.12Art.2,lett.a),Directive95/46/ECandart.4,n.1),Regulation(EU)2016/679.13Art.8,Directive95/46/ECandart.9,Regulation(EU)2016/679.14Art.4,n.13),Regulation(EU)2016/679.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 27
informationaboutthephysiologyorthehealthofthatnaturalpersonandwhichresult,inparticular,fromananalysisofabiologicalsamplefromthenaturalpersoninquestion.
NB:Anonymiseddataarenolongerpersonaldata.Seebelow.
Biometric Data15: personal data resulting from specific technicalprocessing relating to the physical, physiological or behaviouralcharacteristicsofanaturalperson,whichalloworconfirmtheuniqueidentificationofthatnaturalperson.
NB:Anonymiseddataarenolongerpersonaldata.Seebelow.
Anonymization(AnonymisedData)16:Processingofdatawiththeaimof removal of information that could lead to an individual beingidentified.Datacanbeconsideredanonymisedwhenitdoesnotallowidentification of the individuals to whom it relates, and it is notpossible thatany individualcouldbe identified fromthedatabyanyfurtherprocessingofthatdataorbyprocessingittogetherwithotherinformation which is available or likely to be available. Use ofanonymiseddatadoesnotrequiretheconsentofthe“datasubject.”
Simulated Data: Imitation or creation of data that closely matchesreal-worlddata,butisnotrealworlddata.Forthesedata,consentisnotnecessarysinceitisnotpossibletoidentifythe“datasubject.”
Pseudonymization17: The processing of personal data in such amanner that the personal data can no longer be attributed to aspecific data subject without the use of additional information,provided that such additional information is kept separately and issubject to technical and organisationalmeasures to ensure that thepersonaldataarenotattributedtoanidentifiedoridentifiablenaturalperson.
Big Data18: High-volume, high-velocity, high-value and high-varietyinformation(4Vs)assetsthatdemandinnovativeformsofinformationprocessing.
15Art.4,n.14),Regulation(EU)2016/679.16 For the definition of, for example, the Irish Data Protection Authority, seehttps://www.dataprotection.ie/docs/Anonymisation-and-pseudonymisation/1594.htm and the UK InformationCommissioner,seehttps://ico.org.uk/for-organisations/guide-to-data-protection/anonymisation/.17Art.4,n.5),Regulation(EU)2016/679.18 For the 4Vs theory seeBigData to SmartData, Iafrate Fernando [2015]. TheUKData ProtectionAuthority refers toGartner’s definitions “Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”, InformationCommissioner’sOffice,BigDataandDataProtection,6[2014].
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 28
OpenData19:Datathatcanbefreelyused,re-used,andredistributedbyanyone–subjectonly,atmost,totherequirementtoattributeandshare-alike.
PROCESSES Processing of Personal Data20: Any operation (or set of operations)that is performed upon personal data or on sets of personal data,whether or not by automatedmeans, such as collection, recording,organization, structuring, storage, adaptation or alteration, retrieval,consultation, use, disclosure by transmission, dissemination orotherwise making available, alignment or combination, restriction,erasure,ordestruction.
Profiling21: Any form of automated processing of personal dataconsisting of the use of personal data to evaluate certain personalaspectsrelatingtoanaturalperson,inparticular,toanalyseorpredictaspects concerning that natural person’s performance at work,economicsituation,health,personalpreferences,interests,reliability,behaviour,location,ormovements.
NOTIFICATION
Notification: According to different national legislation, datacontrollers have to notify their National Data Protection Authority(DPA) of their intention to use data before starting to process data.Requirements, notification processes, and conditions vary acrossnationalDPAs.
3.1.2 FundamentalPrinciples
European Data Protection legislation provides that personal data must be collected, used, andprocessed fairly, stored safely, and not disclosed to any other person unlawfully. From thisperspective,wecanoutlinethefollowingfundamentalprinciplesregardingpersonaldatause22:
1. Personaldatamustbeobtainedandprocessedfairly, lawfully,and inatransparentway23:according to EU and MS’s national legislation the data controller has to respect certainconditions,forexampledothenotificationprocessbeforestartingcollectingpersonaldataorobtainprior consent from thenatural person (the “data subject”) before collectinghis/herpersonaldata;
2. Personaldata shouldonlybe collected forspecified, explicit, and legitimatepurposes andnotfurtherprocessed inanyway incompatiblewiththosepurposes:personaldatamustbe
19DefinitionofOpenDataHandbook,http://opendatahandbook.org/guide/en/what-is-open-data/20Art.2, lett.b),Directive95/46/ECandart.4,n.2),Regulation(EU)2016/679.In italicspartofsentencesaddedbythenewRegulation(EU)2016/679.21Art.4,n.4),Regulation(EU)2016/679.22TheseprinciplesareextrapolatedfromDirective95/46/EC.23NewEUregulationhasrequiredalsothatpersonaldataareprocessedinatransparentmanner(article5,Regulation(EU)2016/679.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 29
collected for specific, clear, and lawfully stated purposes,which the data controller has tospecifytothe“datasubject”andtothenationalDataProtectionAuthority(DPA);
3. Personaldatashouldbeusedinanadequate,relevant,andnotexcessivewayinrelationtothepurposes forwhichtheyarecollectedand/or furtherprocessed:processingofpersonaldatashouldbecompatiblewiththespecifiedpurposesforwhichitwasobtained;
4. Keeppersonaldataaccurate,complete,and,wherenecessary,up-to-date;
5. Keep personal data safe and secure: the data controller must assure adequate technical,organisational, and security measures to prevent unauthorised or unlawful processing,alteration,orlossofpersonaldata;
6. Retainpersonaldata forno longer than isnecessary:personaldata shouldnotbekept forlongerthanisnecessaryforthepurposesforwhichitwasobtained;
7. No transfer of personal data overseas: it is prohibited to transfer personal data to anycountryoutsideoftheEuropeanUnionandEuropeanEconomicArea.
ThenewEuropeanRegulationhasalsoaddedsomeotherprinciplestocorrectlymanageprivacyanddataprotectionrights.Thesenewprinciplesprovideasfollows:
• DataControlleraccountability:takingintoaccountthenature,scope,context,purposes,andrisks of processing, the Data Controller has to implement appropriate technical andorganisationalmeasures.24
• Principlesofdataprotectionbydesignandbydefault25mustbeapplied:
o Privacybydesign26:TheDataController,beforestartingcollectionandprocessingofpersonaldataaswellasduringtheprocessing itself(“thewhole lifecycleofdata”),has to implement appropriate technical and organisational measures, such aspseudonymization,whicharedesignedtoimplementdataprotectionprinciples,suchas data minimisation, in an effective manner and to integrate the necessarysafeguards into the processing. In other words, before starting “working” withpersonal data, the entire process from the start has to be designed in compliancewiththerequiredtechnicalandlegalsafeguardsofdataprotectionregulations(e.g.adequatesecurity);
o Privacybydefault:TheDataControllerhastoimplementappropriatetechnicalandorganisationalmeasures forensuring that,bydefault, onlypersonaldata that arenecessaryforeachspecificpurposeoftheprocessingareprocessed.27
Morespecifically“Privacybydesign’s”(PbD)coreconcepts28are:
1. Beingproactivenotreactive,preventativenotremedial:The“PbDapproachischaracterized
24Art.24,Regulation(EU)2016/679.25Art.25,Regulation(EU)2016/679.26The“privacybydesign”approachwasdevelopedbytheInformationandPrivacyCommissionerofOntario,Canadainthemid-1990s, see https://www.ipc.on.ca/wp-content/uploads/2013/09/pbd-primer.pdf and https://www.iab.org/wp-content/IAB-uploads/2011/03/fred_carter.pdf. Some European Data Protection Authorities directly referred to thisapproach,evenbefore“Privacybydesign”wasexplicitlyprovidedforinthenewEuropeanregulation.27 For a practical guide on howprivacy by design and by default principles can bemade concretely and effectively seeEuropeanUnionAgencyforNetworkandInformationSecurity(ENISA),PrivacyandDataProtectionbyDesign:FromPolicytoEngineering,December2014,https://www.enisa.europa.eu/publications/privacy-and-data-protection-by-design.28 Concepts are extrapolated from PbD approach of the Information and Privacy Commissioner of Ontario, seehttps://www.ipc.on.ca/wp-content/uploads/2013/09/pbd-primer.pdf.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 30
by proactive rather than reactive measures. It anticipates and prevents privacy invasiveevents before they happen. PbD does notwait for privacy risks tomaterialize, nor does itofferremediesforresolvingprivacyinfractionsoncetheyhaveoccurred—itaimstopreventthemfromoccurring.Inshort,PrivacybyDesigncomesbefore-the-fact,notafter”;
2. Havingprivacyasthedefaultsetting:“PbDseekstodeliverthemaximumdegreeofprivacybyensuringthatpersonaldataareautomaticallyprotectedinanygivenITsystemorbusinesspractice.Ifanindividualdoesnothing,theirprivacystillremainsintact.Noactionisrequiredonthepartoftheindividualtoprotecttheirprivacy—itisbuiltintothesystem,bydefault”;
3. Havingprivacyembeddedintodesign:“PbDisembeddedintothedesignandarchitectureofITsystemsandbusinesspractices.Itisnotboltedonasanadd-on,afterthefact.Theresultisthat privacy becomes an essential component of the core functionality being delivered.Privacyisintegraltothesystem,withoutdiminishingfunctionality”;
4. Avoiding the pretence of false dichotomies, such as privacy vs. security: “PbD seeks toaccommodateall legitimate interestsandobjectives inapositive-sumwin-winmanner,notthroughadated,zero-sumapproach,whereunnecessarytrade-offsaremade.PbDavoidsthepretenceoffalsedichotomies,suchasprivacyvs.security–demonstratingthatitispossibletohaveboth”;
5. Providingfulllife-cyclemanagementofdata:“PbD,havingbeenembeddedintothesystemprior to the first element of information being collected, extends securely throughout theentirelifecycleofthedatainvolved—strongsecuritymeasuresareessentialtoprivacy,fromstarttofinish.Thisensuresthatalldataaresecurelyretained,andthensecurelydestroyedattheendoftheprocess,inatimelyfashion.Thus,PbDensurescradletograve,securelifecyclemanagementofinformation,end-to-end”;
6. Ensuring visibility and transparency of data: “PbD seeks to assure all stakeholders thatwhateverthebusinesspracticeortechnologyinvolved,itisinfact,operatingaccordingtothestatedpromisesandobjectives,subjecttoindependentverification.Itscomponentpartsandoperationsremainvisibleandtransparent,tousersandprovidersalike.Remember,trustbutverify”;
7. Beinguser-centric and respecting user privacy: “PbD requires architects and operators toprotect the interestsof the individualbyofferingsuchmeasuresasstrongprivacydefaults,appropriatenotice,andempoweringuser-friendlyoptions.Keepituser-centric”.
3.1.3 Notificationprocessanddataprotectionimpactassessment
Generally, every data controller has to notify its national Data Protection Authority (DPA) of itsdecision to start collection of personal data before starting this process. This notification aims atcommunicatinginadvancethecreationofanew“database,”explainingthereasonsforandpurposesof this, and the technical and organisational safeguards in place to protect the personal data.Consequently, DPAs are enabled to verify the legal and technical safeguards required by EUlegislation. However, the conditions attaching to and the procedures for submitting such anotificationdifferfromEUstatetoEUstate,withthestrongestprotectionsinplaceinGermanyandtheNetherlandsandtheleastinIrelandandtheUK.
The new European Regulation will introduce a different way to manage data protection issues,followingPbDprinciples,however.EachDataControllerhastocarryoutanassessmentoftheimpact
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 31
ofprocessingoperationson theprotectionofpersonaldatabefore starting theprocessing itself toevaluatetheorigin,nature,particularity,andseverityofrisk29attachingtotheirproposedprocessing.SuchDataProtection/Privacy ImpactAssessments (DPIA)can thenbeutilised todefineappropriatemeasurestoassuredataprotectionandcompliancewithEUlegislation.
ADPIAisrequiredincaseof:
• Systematic and extensive evaluation of personal aspects in automated processing (e.g.profiling);
• Processing on a large scale of sensitive data or of personal data relating to criminalconvictionsandoffences;
• Systematicmonitoringofapubliclyaccessibleareaonalargescale.
ThemainaspectsofDPIAsare:
a) Systematicdescriptionofprocessingoperationsandthepurposesoftheprocessing;
b) Assessmentof thenecessityandproportionalityof theprocessingoperations in relation tothepurposes;
c) Assessmentoftheriskstotherightsandfreedomsofdatasubjects;
d) Measurestodealwiththerisks,includingsafeguards,securitymeasures,andmechanismstoensuredataprotectionandtodemonstratecompliancewithEUlegislation.
IntheeventthataDPIAindicatesahighriskintermsofdataprotectionandprivacyrights,theDataControllermustconsulttheNationalDataProtectionAuthoritypriortotheprocessing.30
3.1.4 NotificationprocessinEW-Shoppproject
TheuseofdatasetwithinEW-Shoppprojecthave tocomplywithapplicable international,EUandnationallaw(inparticular,EUDirective95/46/EC).
Tothisaim,dataownershavebeenaskedtoevaluateeachoftheirdatasetinordertoconfirmthenatureandsensitivityofdatatobeusedwithinEW-Shoppproject.
In order to make this evaluation, dataset owners, for each dataset, have to clarify if their owndataset contains PD. If the dataset contains PD, they have to provide notification and informedconsentforsecondaryuse.
If thedataset, tobeusedforEW-Shoppproject,doesnotcontainPD, it isneededtoclarify if it isderivedfromadatasetwhichcontainsPD.IfthedatasetderivesfromadatasetwhichcontainsPD,thedataownershouldprepareastatementwhichexplainsthathewillnotusedataproducedintheprojecttoenrichdatasetcontainingPDforDMPaimsandprovidealsothenotificationwiththeECregardingtheoriginaldatasetwhichcontainsPDtobeincludedindeliverable[D7.2].
29Art.35,Regulation(EU)2016/679.30Art.36,Regulation(EU)2016/679.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 32
IfthedatasetdoesnotcontainPD(orderivesfromadatasetdoesnotcontainPD),thedataownershouldprovideastatement,whichdetailsthathisowndatasetdoesnotcontainPD(explainingtheimplementedprocedures,etc.).
Allthenotificationsandcopyofopinionsperformedbyownersofdataset,whichcontainsPDwillbecollectedindeliverable[D7.2].
3.2 Ethicsrequirementsregardingtheinvolvementofhumanrights
The EW-Shopp project is implemented considering fundamental ethical standards to ensure thequality and excellence in the process and after the life of the project. In the Horizon 2020 it isspecifiedthatEthicalresearchconductimpliestheapplicationoffundamentalethicalprinciplesandlegislation to scientific research in all possible domains of research. According to the procedureestablished in the Horizon 2020 in terms of Ethics, in order to achieve the engagement of thescientificresearchwiththeethicaldimension, inEW-ShoppprojecteachBCownerhasbeenaskedtoanswerthefollowingquestions:
• Arethereanyethicalissuesthatcanhaveanimpactondatasharing?• Haveyoutakenthenecessarymeasurestoprotectthehumans’rightsandfreedoms?• Howdid/couldthesemeasuresimpacttheBC?• Doyouassesstheriskslinkedtothespecifictypeofdatayourorganizationprovides?
3.3 IntellectualPropertyRights
InthecontextofEW-Shoppproject,theIPRownershipisfundamentallyregulatedbytheunderlyingprinciplesoftwomainofficialdocuments(namely[CA]and[GA]),butfurtherconsiderationswillbedetailed within WP5 frame and provided in its outcome “D5.4 – Update of Exploitation andDisseminationStrategy(M24)”.
TwomainconcernsonIPRmanagementcouldimpactthecurrentdeliverable:
• ExistingordevelopeddatasetswillbeavailabletothewholeConsortiumduringtheprojecttimespan, but any further use in exploitation activities must follow specific limitationsand/orconditions(asstatedinArticle25.3ofthe[GA]anddescribedinitsAttachment1).
• All the identified datasets will be available to all Beneficiaries in order to develop thebusinesscasesusedtovalidatetheprojectresults,asexplicitlymentionedinthedescriptiontablescontainedin“Chapter6-Datasetdescription”(seeDatasetACCESSsection).
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 33
Chapter4 Business Case high-leveldescription
ThemainbusinessobjectiveofEW-Shoppistodevelopcross-domaindataintegrationplatformthatwould enable fragmented European business ecosystem to increase efficiency andcompetitivenessthroughbuildingrelevantcustominsightsandbusinessknowledge.Thisplatformwill enable us to regain lost positions in competing against global internet service giants thatmanagedtopositiontheirgrowthandsectortransformationonintensiveexploitationofintegratedbigdatageneratedattheirproprietaryplatforms.
4.1 BingBang,CENEJE(BC1)
The goal of the business case is to follow user experience based on real time cross channel dataintegration. The business case will develop analytical predictive models for managing marketingactivities, sales resources, operations, data quality and content management that will increasepartnerefficiencyandsales.Itwillfurthermoreenablethedevelopmentofmarketdataenrichmentservicesandconsequentmonetization. Thiswill bedone through integrating cross-channel intent,research,interest,interactionandpurchasedatawithpointofsalessolutions.
Thedatathatwillbeintegratedare:
• Purchase intent:A collectionof user journeydata – pageviews, search terms, redirects tosellersandsimilar.
• Product attributes: A collectionof product attributes (varying fromgeneric such as name,EAN, brand, categorization and color to more specific as dimensions or technicalspecifications).
• Productspricehistory:Acollectionofsellerquotesforproducts.• Customer purchase history: Sell out data matched with customer baskets in a defined
timeframe.• Consumer intentand interaction:Acollectionofuser journeydatafromGoogleAnalytics -
pageviews,pageevents,searchterms,redirectstochannels,etc.• Contact and Consumer interaction history: calls (outbound, inbound and simulated calls),
other contacts events (email, SMS, click-through, fax, scan, or any other document) andotherevents.
To achieve the business case goals, in EW-Shopp we will set-up a virtual lab in a data cloudenvironmentwherewewillcreateasetofscenariosbyintegratingpartnerdatasetsofanonymizeduser paths to purchase that should include all possible engagements, decisions and purchaseinformation.Thedatawillbeusedinorderto:
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 34
• developmodelsofpurchasebehavior;• clustersimilarbehaviorstooptimizeoperations;• enableuserexperienceadvertising;• developefficientsalespromotions;• provideefficientmarketingandcommunicationtools;• buildsegmentedmailinggroupsforefficientautomatizationofe-mailmarketing;• increaseefficiencyinabove-the-line(massmedia)andbelow-the-line(onetoone)activities;• createefficientPOSsolutionsforsales.
4.2 GfK(BC2)
Thegoalof thebusiness case two, is to findwhichare theexternal variablesand theirweights inpredictingsalesandsuccessofproducts.ExcepttheintegrationbetweenthetwodatasetsprovidedbyGfK,thisbusinesscaseaimsatintegratingalsoexternaldatasuchaseventsandweatherdata,inordertoimprovepredictability.
Thetwoservices,RetailSalesDataReportingSystemandEcho,wheretheformerallowstomaximizesalesandprofit inorder tokeepcustomerscomingback,while the latter tracksand improves theexperiencesofcustomersinreal-time.Thepredictivemodellearnedupontheintegrateddataaboutcustomerfeedbackaswellasthirdpartydatawillidentifywhichactionsdrivegrowth.
Thedatathatwillbeintegratedare:
• Marketdata:Salesdata(techgoods),ProductAttributesandPricesData(techgoods),andPurchaseData
• Consumers data: Demographics, TV Behaviour& Exposure Data (passive / survey), OnlineBehavior&ExposureData,IndividualPurchaseData(passive/survey),andMobileUsage&ExposureData
• Event data, including Sport Events (World cup, Champion, Olympic games, etc.), SocialEvents (strikes, terrorism, epidemics, etc.), Political Events (elections, relevant laws, etc.),NaturalEvents(earthquake,floods,etc.)
• HistoricalWeatherData:relevantweatherinformationacrossdifferentcountries• Socialmediadata:measuresofcustomerengagementacrossdifferentplatforms(e.g.,email
marketing,search• Purchaseintentandsearchdata:dataaboutpurchaseresearchandintentbycategoryand
searchbehaviourbasedonkeywordinteractionthroughadvertising.
4.3 Measurence(BC3)
ThegoalofthebusinesscaseistoimprovetheMeasuranceScout,alocationscoutingsolutionthathelpsinchoosingthebestlocationforthebusiness.Thiswilloptimizetherealestateinvestmentsbyanalyzingthetrafficaroundthelocationoftheirinterest.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 35
The traffic data, after being anonymized, are collected byMeasuranceWiFI technology at a highlevel of granularity.Moreover. in order to understand better the potential location,Measuranceneedalsoexternaldata suchasweatherdata,eventdata,geographicdata, salesdataofbusinessetc.
Thedatathatareplannedtobeintegratedare:
• Weatherdataatahighlevelofgranularity• Events data around a location: we need to be able to filter these events based on their
venueand,ideally,onthenumberofpeopleexpectedtojointheevents• Geographical data: Businesses in the area (shopping, restaurants etc.), schools, tourist
attractions,nightlife,etc.• Sales data: business volume of businesses in the area aggregated by kind of activity (e.g.
restaurants,clothesshop,etc.)
4.4 JOT(BC4)
Thegoalof JOTBusiness case isusingbigdata technologyand integrating crossdomainpurchaseintentiondataonthelevelofsearchandcommunicationandcontentinteractionsinordertoenableJOT to increase its clients’ communication efficiency and marketing effort allocation. Currentmethodsforonlinemarketingpredictionhavefailedsimplybecausethereisnosinglerulethatcanbeuniversallyappliedtoallmarkets,productsandsectors.Theonlywaytoeffectivelyfindanonlinemarketingmethodistoanalyseuserbehaviourandtrafficsources,takingintoaccountthedifferentaspectsofexternalenvironmentalandbehaviouralvariablesthatimpactit.Throughanalysingmarketingcampaignperformance,JOTcanobtainbehaviourpatternsthatcanbeusedtoestablishabehaviouralbaseline.ThankstothisJOTwillbeabletopredictthelikelypatternforcertaindaysortimeszoneswithsimilarcharacteristics.Behaviouranalysiscouldbeobtainedbycross-referencinggeographicaldatawithpeaktimes,baselinetraffic,dailyimpressionstrends,real-time conversion and bounce rates just to name a fewmetrics. Furthermore, in order to achieveaccurate results, a vast amountofdatawill have tobe collected soas toprovideaccuracy to thedatasample.JOT had planned to provide three different datasets within the project (two are proprietary andmeaningfulmainlyonlyintheirownbusinesscase):
• Trafficsources(Bing):HistoricalmarketingcampaignperformancestatisticsofsearchdatainBingadvertisingplatforms.
• Traffic sources (Google): Historical marketing campaign performance statistics of data inGoogleplatform.
• Twittertrends:TrendingtopicsasavailablethroughTwitterAPIs.Inrespectofthe[DoA],JOThasdatasetstosimplifytheusageoftheirdatawithinEW-Shoppprojectwithoutimpactingthesupportoftheservicesforeseeninthisbusinesscase:
• theoriginalPixelDatasethasbeenunifiedwithTrafficsourceGoogleandTrafficsourceBing;
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 36
• for the Emailmarketing campaign dataset, the company Impactingwas no longer able toprovide it. JOTconfirmed thisdatasetdoesnotaffect thegoalof theproject,being justacomplement to the Traffic Source ones, so this will not interfere in the business casesuccess.Moreover, thishasallowed to removing,at the source, theproblemrelated to IPandgeo-localisation.
Otherdatasetswillbeaddedtotheabove-mentionedonesinordertorealizetheJOTbusinesscase:
• Events: A dataset covering different kinds of events (sporting, large-scale concerts,congresses,elections)forthedifferentcountriesthatwishtotakepart intheusecasewillbeneeded.ThiskindofdatasetisprovidedthroughEventRegistrydataset.
• Weatherhistory:ThisdatasetwillcontainhistoricaldataontheweatherthatJOTwillutilizefor the project. It will show the real weather conditions, even down to a specific hour /minute,duringthetimeperiodchosenforthestudy.ThisdatasetisprovidedthroughMARS(historicaldata)dataset.
• Weatherforecast:Sametimeperiodasforthepreviousdatasetbutjustthattheinformationwillbe theweather forecastedorpredicted for thegiven times,notnecessarily theactualclimaticconditions.
Thepurposeofthisbusinesscaseisrelatedtocarryoutsystematicanalysestopredicttheeffectofdifferent variables suchasweatherandothereventson theperformanceofmarketing campaign.Theseanalyseswillleadtothedevelopmentofdifferentbusinessservices:
1. Eventandweather-awarecampaignscheduling.Thisservicewillbeusedby JOTtopredicttheverybestmomenttolaunchorrunamarketingcampaignbasedonweatherconditionsandevents.
2. Event-based customer engagement analysis. This service supports the analysis of thepossibleimpactofeventsonOnlineShopping.
3. Event-based digital marketing management. This service supports intelligent bidding ondigitalmarketingplatforms,programmedbasedonevents.
4. Weather-responsive digital marketing. This service offers intelligent bidding on digitalmarketingplatforms,basedonreal-timeweatherconditions.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 37
Chapter5 EW-Shopp Methodology forDMP
Theaimofthischapteristoprovideanexplanationofalltheinformationrequiredtodataownersinordertomakedatafindable,accessible,interoperableandre-usable(FAIR)andtosharetheprocessfollowedinEW-Shopptocollectthesedata.
5.1 ElementsofEW-ShoppDataManagementPlan
TheDMPshouldaddresssomeimportantpointsonadatasetbydatasetbasisandshouldreflectthecurrentstatusofreflectionwithintheconsortiumaboutthedatathatwillbeproduced.TheDMP,asakeyelementofgooddatamanagement,hastodescribethelifecyclemanagementappliedtothedatatobecollected,processedand/orgeneratedbyaHorizon2020project.
Inordertomakedatafindable,accessible,interoperableandre-usable(FAIR),aDMPshouldinclude:
• DatasetIdentification:specifyingwhatdatawillbecollected,processedand/orgenerated.• DatasetOrigin:specifyingifexistingdataisbeingre-used(ifany),theoriginofthedataand
theexpectedsizeofthedata(ifknown).• Dataset Format: describing the structure and type of the data, time and spatial coverage
andlanguageandnamingconventions.• DataAccess:specifyingwhetherdatawillbeshared/madeopenaccess.Inparticular,for:
o Makingdataaccessible: specifying if andwhichdataproducedand/orused in theproject will be made openly available, moreover explaining why certain datasetscannot be shared (or need to be shared under restrictions), separating legal andcontractualreasonsfromvoluntaryrestrictions.
o Making data interoperable: specifying if the data produced in the project isinteroperable,thatisallowingdataexchangeandre-use.Moreover,specifyingwhatdataandmetadatavocabularies,standardsormethodologiesitismeanttofollowtomakedatainteroperable.
• Data Security: specifying which provisions are in place for data security (including datarecoveryaswell as secure storageand transferof sensitivedata). Furthermore, specifyingPersonalDatapresenceand,inthatcase,privacymanagementproceduresputinpractice.
Thefollowingparagraphsaimtogivemoredetails, intermsof theclassofattributes listedabove,andwillbeusedasaguidetodescribedatasetsprovidedforEW-Shopppurpose,inaccordancewiththeGuidelinesonDataManagementinHorizon2020.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 38
5.1.1 DatasetIDENTIFICATION
Firstofall,it’sneededtoidentifythedatasettobeproducedandprovidedatasetdetails,intermsofdescriptionofthedatathatwillbegeneratedorcollected.
FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetidentification:
• Category:Datasettypology(Market,Consumer,Products,Weather,Media).• Dataname:Nameofthedatasetthatshouldbeaself-explainingname.• Description:Descriptionofthedatasetinordertoprovidemoredetails.• Provider:Nameofthebeneficiaryprovidingthedataset(orbeinginchargeofbringingitinto
theproject).• ContactPerson:Nameofthepersontobecontactedforfurtherdetailsaboutthedataset.• BusinessCasesnumber:BCinvolved(i.e.,BCx)
5.1.2 DatasetORIGIN
FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetorigin:
• Availableat(M):Projectmonthinwhichthedatasetwillbeavailable.• CoreData (Y|N): Indicate if the dataset ismandatory andwill be part of the data shared
alongthedifferentUCsorifitisdiscretionaryandpresentonlyalimitedusage.• Size:Aroughorderofmagnitude(ROM)estimationintermsofMB/GB/TB.• Growth: A dynamic rough order of magnitude (ROM) estimate by selecting the most
appropriatefrequencyintermsofMB/GB/TBperhour/day/week/months/other.• Type and format: Dataset format, specifying if it is using, for example, CSV, Excel
spreadsheet,XML,JSON,etc.• Existingdata(Y|N):Thedataalreadyexistoraregeneratedfortheproject’spurpose.• Dataorigin:Howthedatainthedatasetisbeingcollected/generated(i.e.SQLtable,Google
API,etc.)
5.1.3 DatasetFORMAT
FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetformat:
• Dataset structure: description of the structure and type of the data. (i.e. the headercolumns,theJSONschema,RESTresponsefields,etc.).
• Dataset format: definition of the dataset format (i.e. specifying if it is using CSV, Excelspreadsheet,XML,JSON,GeoJSON,Shapefile,HTTPstream,etc.).
• Timecoverage:ifthedatasethasatimedimension,indicationofwhatperioddoesitcover.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 39
• Spatialcoverage:ifthedatasetrelatestoaspatialregion,indicationofwhatisitscoverage.• Languages:languagesofmetadata,attributes,codelists,descriptions.• Identifiability of data: reference to identifiability of data and standard identification
mechanism.• Namingconvention:descriptionabouthowthedatasetcanbeidentifiedifupdatedorafter
aversioningtaskhasbeenperformed,ifthedatasetisnotstatic.• Versioning:referencetohowoftenisthedataupdated(i.e.Noplannedupdating,Annually,
Quarterly,Monthly,Weekly,Daily,Hourly,Everyfewminutes,Everyfewseconds,Real-time)andhowtheversioningismanaged(i.e.ifdaily,everydayanewdatasetisgeneratedwiththenewlycreateddataoreverydayanewdatasetoverridestheoldonecontainingallthedatageneratedfromthebeginningofthecollection,…)
• Metadatastandards:specificationofstandardsformetadatacreation(ifany).Iftherearenostandardsdescriptionofwhatmetadatawillbecreatedandhow.
5.1.4 DatasetACCESS
FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetaccesswiththeaimtomakingdataaccessibleandinteroperable:
• Datasetlicense:ifthedatasetisreleasedasopendata,indicationofthelicenseused:CC031,CC-BY32, CC-BY-SA33, CC-BY-ND34, CC-BY-NC35, CC-BY-NC-SA36, CC-BY-NC-ND37, PDDL38,ODC-by39,ODbL40,otherorproprietary(withlinkifpossible).Otherwise,specifywhohaveaccesstothedataset(forexample,allpartnersintheconsortium,somepartnersforthepurposeoftooldevelopment,onlyasamplewillbedisclosed,etc.)
• Availability(public|private):thedatasetispublicorprivate.• AvailabilitytoEW-Shopppartners(Y|N):thedatasetisavailabletoEW-Shopppartners.• Availabilitymethod: specificationofhowthedatawillbemadeavailable (i.e.webpage in
the browser, web service (REST/SOAP APIs), query endpoint, file download, DB dump,directlysharedbytheresponsibleorganization,etc.).
• Tools toaccess: specificationofwhatmethodsorsoftware toolsareneededtoaccess thedata.
• Dataset source URL: specification of where the data and associated metadata,documentationandcodearedeposited(i.e.datasetsourceURL,etc.)
31https://creativecommons.org/share-your-work/public-domain/cc0/32https://creativecommons.org/licenses/by/2.0/33https://creativecommons.org/licenses/by-sa/2.0/34https://creativecommons.org/licenses/by-nd/2.0/35https://creativecommons.org/licenses/by-nc/2.0/36https://creativecommons.org/licenses/by-nc-sa/2.0/37https://creativecommons.org/licenses/by-nc-nd/2.0/38https://opendatacommons.org/licenses/pddl/39https://opendatacommons.org/category/odc-by/40https://opendatacommons.org/licenses/odbl/
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 40
• Access restrictions: specification of how access will be provided in case there are anyrestrictions.
• Keyword/Tags: categorization of the dataset through some relevant keywords/tags (i.e.productcategories,price,etc.)
• Archivingandpreservation:descriptionoftheproceduresthatwillbeputinplaceforlong-termpreservationofthedata.Indicationofhowlongthedatashouldbepreserved,whatisitsapproximatedendvolume,whattheassociatedcostsareandhowtheseareplannedtobecovered.
• Data interoperability: specification of what data andmetadata vocabularies, standards ormethodologieswillbefollowedtofacilitateinteroperability.
• Standard vocabulary: specification ofwhat standard vocabulary, to allow inter-disciplinaryinteroperability,willbeusedforalldatatypespresent inthedataset. Ifnot,amappingtomorecommonlyusedontologieshastobeprovided.
WeprovidesomemoreclarificationsabouttheapproachtodescribeData interoperabilityandStandard vocabulary dimensions in EW-Shopp. Because of the sensitiveness of business dataused in the EW-Shopp innovation action, no commitment to publish datasets provided bybusiness partners as open data is made in [DoA]. Thus, the primary focus concerninginteroperabilityinEW-Shoppisonsupportingdataintegrationtasks,ratherthanonsupportingdiscoverabilityofdatasetsbythirdparties.
Forthisreason,inDatainteroperability,wewillfocusonmethodologiesthatwillbeadoptedtosupportinteroperabilitybetweenthedescribeddatasetandotherdatasets.Herewewillshortlydescribe the interoperability methodologies that we plan to use, while more details will beprovidedinD3.1–InteroperabilityRequirements,whichwillbepublishedatM8.
o Publication as linked data (RDF-ization). Linked data represented with the RDF41languageprovidesupporttodatainteroperabilityby:i)representinginformationaswithgraph-basedabstractions,oftenreferredtoasKnowledgeGraphs(descriptionsoftypedentities, their properties and mutual relations), ii) using global identifiers for entitiesdescribed in a dataset (URIs), iii) using terms (classes, properties, data types) fromshared vocabularies and ontologies. Publishing a source dataset using linked dataprinciples makes it easy to access and use the data for future integration tasks. Thismethodology is used in particular for EW-Shopp core data, i.e., data that are used asjoints to integrate different information sources like product data or productclassificationschemes,whicharenotavailablealreadyaslinkeddata.
o N/A(LinkedOpenData).Fordatathatarealreadyavailableaslinkeddata,weconsiderinteroperabilitymethodologynotapplicable.
o Semantic data enrichment. This is a key pillar of EW-Shopp approach adopted tosupport interoperability. Given an input dataset that is provided in a format differentfrom RDF, and after applying suitable transformations if needed, the dataset will besemanticallyannotatedusingsemantic labelling techniques.Weassumethat the input
41https://www.w3.org/RDF/
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 41
datasetistransformedinatableinCSVformat,then,i)theheadersofthecolumntableswill be aligned with shared vocabularies (e.g., XSD used to define the data types, orpredicates of Schema.org42 used to describe offers in eCommerce portals), while ii)values will be linked to shared systems of identifiers (e.g., location identifiers fromDBpedia).Annotationswillsupporttheenrichmentofthedatausingthesharedsystemofidentifiersasjoints,andii)publicationofthedataasKnoweldgeGraphsrepresentedin RDF (if useful). For example, after linking a column representing product names toEAN codes, we can retrieve the brand of each product from a linked product datasource, thus enriching the original dataset. Semantic data enrichment also provides amethodologytopublishdatathatcomeintabularformataslinkeddata.However,suchapublicationisnotamandatorystepinsemanticdataenrichment.
o Referencestosharedsystemsofidentifiersandstandarddatatypes.AdatasourcesismadeinteroperablebyusingsharedsystemsofidentifierswithoutrequiringafullRDF-ization. For example, we may want to invoke weather data APIs using DBpediaidentifiersforlocations.
For Standard vocabulary, we refer to shared vocabularies, where “shared” refer to adoption bycommunityof users.Among shared vocabularieswe consider ISO standards, e.g., ISO860143 dateformats, languagesandvocabularies recommendedbyW3C44,e.g.,RDForTimeOWL245,butalsovocabulariesandsystemsof identifiers thatarebecomingde-factstandardbecauseofusage,e.g.,Schema.org,DBpedia,Wikipedia.Wewillconsiderthefollowingsharedvocabularies,whichwillbeusedintheprojecttosupportinteroperability:
o Terminologiesfromlanguagespecifications• Predicates,classesanddatatypesspecifiedinlanguagesrecommendedbyW3C(i.e.,
XSDDataTypes46,RDF,SKOS47,RDFS48,OWL49);thesetermsareusedthroughouttheproject,thustheywillnotbeaddedtothedescriptionsofindividualdatasets.
o Classifications• Interlinked product classifications. This classificationwill be built in EW-Shopp by
linking Google Categories (from Google product taxonomy), Global ProductClassification by GS1 1 and GFK product categories, i.e., categories used in GFKProduct Catalog 2 (GS1 categories are derived from GFK categories and the twoclassificationsarealigned).
o Domainontologiesandsharedsystemsofidentifiers• Linkedproductdata.
§ Schema-levelterminology(e.g.,Schema.org,GoodRelations50)
42http://schema.org/43https://www.iso.org/iso-8601-date-and-time-format.html44https://www.w3.org/45https://www.w3.org/TR/owl-time/46https://www.w3.org/TR/xmlschema11-1/47https://www.w3.org/2004/02/skos/48https://www.w3.org/TR/rdf-schema/49https://www.w3.org/TR/owl-features/50http://purl.org/goodrelations/
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 42
§ Schema-level terminology and identifiers (GfK Product Catalog for retail,withinternalidentifiersandpartiallyalignedtoEANcodes3)
• Temporalontologies.StandardvocabulariesandothervocabulariesandontologiesrecommendedbyW3Ctorepresenttemporalinformation(e.g.,ISO8601,XSDDateandTimeDataTypes,TimeOWL2).
• Spatial ontologies and locations. Ontologies covering spatial schema-levelterminologyaswellasidentifiersoflocationsandadministrativeunitsacrossEurope(e.g., BasicGeoWGS8451,DBpediaOntology52, Schema.org,GeonamesOntology53,LinkedGeoData54,LinkedOpenStreetMaps55)
• Wikipediaentities.WikipediaprovideidentifiersforaverylargenumberandvarietyofentitiesdescribedinWikipedia,whichareadoptedbyaverylargecommunityofdataprovidersandconsumers.WithWikipediaentities,wereferalsoto identifiersused indatasourcesderivedfromWikipedia(e.g.,DBpedia)or linkedtoWikipediaidentifiers (e.g.,WikiData56).While identifiers of location play a prominent role inEW-Shoppandarecoveredbyspatiallocations,herewerefertoentitiesofdifferenttypes,used,e.g.,toannotateevents.
5.1.5 DataSECURITY
FollowingH2020guidelines,ithasbeendefinedasetofrelevantinformationthatcanhelptodefinethedatasetsecurity:
• PersonalData(Y|N):Confirmationaboutpersonaldatapresenceinthedataset.• Anonymized(Y|N|NA):confirmationifpersonaldataisanonymized.• Datarecoveryandsecurestorage:Informationabouthowwasmanageddatarecoveryand
securestorage.• Privacy management procedures: Specification about procedure addressed in order to
manageprivacy.• PDAtTheSource(Y|N):ConfirmationaboutPersonaldataabsenceatthesource.• PD-Anonymisedduringproject(Y|N):ConfirmationaboutPersonaldataanonymisedduring
theproject.• PD - Anonymised before project (Y|N): Confirmation about Personal data anonymised
beforetheproject.• LevelofAggregation(forPDanonymizedbyaggregation):Indicationaboutwhichisthelevel
ofaggregationtoallowPersonaldataanonymization.
51https://www.w3.org/2003/01/geo/52http://dbpedia.org/ontology/53http://www.geonames.org/ontology/documentation.html54http://linkedgeodata.org/ontology55Auer,Sören,JensLehmann,andSebastianHellmann."Linkedgeodata:Addingaspatialdimensiontothewebofdata."TheSemanticWeb-ISWC2009(2009):731-746.56https://www.wikidata.org
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 43
5.2 Processtocollectdatasetdetails
The goal to collect all the information, described in the previous paragraphs, has been achieved,withrespecttoEW-Shoppdataset,throughtheprocessdescribedherebelow.The first step was intended to set up a table with themain sections of the Dataset description:Dataset Identification,Datasetorigin,Datasetformat,DatasetaccessandDatasetsecurity.Eachofthese sections was further decomposed to contain all the information described in the relatedparagraphsshowedinthisChapter5The second step consisted in preparing a sort of survey in the form of a textual description (seeAnnexA–DMPSurvey),withthescopetogiveaclearunderstandingofalltherequiredinformationandeasethefulfilmentofthetable.Thethirdstepwasrealizedbyperformingacollectionprocess,wheneachBusinesscaseownerhadto fulfill the table and then it was interviewed by a technical partner aiming at discussing theinformationprovided.At theendof theprocess,all the informationcollectedwasmerged inan integratedspreadsheet.The same informationwillbediscussed, in the followingchapter,usinga table format inorder toeasetheunderstandingofeachdatasetdescription.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 44
Chapter6 DatasetdescriptionThe aim of this chapter is to provide, for each dataset, a description trying to answer to all theinformationlistedinChapter5inaccordancewithGuidelinesonFAIRDataManagementinHorizon2020 and with ethics and legal requirements. Dataset, as it’s possible to see in the followingparagraphs, refers to individual dataset but also to families of datasets with the same structurecreatedindifferentmomentsoftimeorunderotherdiscriminatingconditions.
6.1 CEDataset-ConsumerData:PurchaseIntent
6.1.1 DatasetIDENTIFICATION
Thedataset“PurchaseIntent”isproprietaryandcontainsuserjourneymetricsandlogs.
Table5.DATASETIDENTIFICATION–PurchaseIntent
Category ConsumerdataDataname PurchaseintentDescription A collection of user journey data – pageviews,
searchterms,redirectstosellersandsimilar.Datais logged to local databases and we provide datafrom 1. 1. 2015. Local databases consist of SQLdatabasesandNoSQLdatabases.
Provider CenejeContactPerson DavidCreslovnik
UrosMevcBusinessCasesnumber BC1
6.1.2 DatasetORIGIN
This dataset is available from January 2017 and it cannot be defined as “core data”. The datasetalreadyexisted.
Table6.DATASETORIGIN–PurchaseIntent
Availableat(M) M1CoreData(Y|N) NSize Local:
65millionpageviews17milliondeeplinks
Growth 300000pageviewsperday
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 45
15000searchesperday25000redirectsperday
Typeandformat structureddocuments,TSV
Existingdata(Y|N) YDataorigin SQLtables
NoSQLdocuments
6.1.3 DatasetFORMAT
Thedatasethasatsv(SQL)orjson(NoSQL)format,thedatastructureisillustratedinthefollowingtable. It collects data not in a specific language, since 2015 and it covers information at Countrylevel. The data is updated daily that means every day the dataset contains only the data newlygenerated.
Table7DATASETFORMAT–PurchaseIntent
Datasetstructure *SQLtables*Productpageviews-IdProduct(INT)-NameProduct(STRING)-L1(STRING):Level1category-L2(STRING):Level2category-L3(STRING):Level3category-IdUsers(INT)-Date(DATETIME)Productdeeplinks(redirectstosellers)-IdProduct(INT)-NameProduct(STRING)-L1(STRING):Level1category-L2(STRING):Level2category-L3(STRING):Level3category-IdUsers(INT)-IdSeller(INT)-Date(DATETIME)*NoSQLdocuments*Pagesearch{"_id":(ObjectId)"IdUsers":(INT),"TimeStamp":(ISODate),"Search":{"NumberOfResults":(INT),
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 46
"Query":(STRING)}
Datasetformat SQL:tsvNoSQL:json
Timecoverage since2015Spatialcoverage CountryLanguages notlanguagespecificIdentifiabilityofdata YesNamingconvention /{country}/YYYY/MM/DD.tsvVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A
6.1.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailable through File-download by means of WGET/Curl. Dataset will be deposited on AWS orCenejestaticcontentserverandtheaccessisprovidedbycredentials.
Table8MAKINGDATAACCESSIBLE–PurchaseIntent
Datasetlicense Owner:CenejeAccess:Allmembers
Availability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Filedownload(zip)Toolstoaccess WGET/CurlDatasetsourceURL AWSorCenejestaticcontentserverAccessrestrictions CredentialsKeyword/Tags N/AArchivingandpreservation NO(canbegeneratedondemand)
Table9MAKINGDATAINTEROPERABLE–PurchaseIntent
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Interlinkedproductclassification
• Linkedproductdata• Temporalontologies
6.1.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldatabecausethesewereanonymizedbeforebeingused intheproject.Itisexpectedasecurestorageandregularbackups.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 47
Table10DATASETSECURITY-PurchaseIntent
PersonalData(Y|N) NAnonymized(Y|N|NA) YDatarecoveryandsecurestorage Securestorage,regularbackupsPrivacymanagementprocedures N/APDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) YLevelofAggregation(forPIDanonymizedbyaggregation) UserIdlevel(anonymous)
6.1.6 EthicsandLegalrequirements
ThesourceofthedatacontainsPD,butdataareanonymizedbeforetheprojectandsharedwithinthe project without PD. Since Ceneje already notified to their Data Protection Officer (DPO) thattherewillbenoPDshared,theydon’tneedtogetadditionalopinion.NotificationtoDataProtectionOfficerisincludedindeliverable[D7.2].
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.
6.2 MEDataset-ConsumerData:Locationanalyticsdata(Hourly)
6.2.1 DatasetIDENTIFICATION
The dataset “Location analytics data”, provided by Measurence, focuses on Hourly number ofdeviceswithWiFienabledthatpassthroughanareacoveredbyMeasurenceWiFisensors.
Table11.DATASETIDENTIFICATION–Locationanalyticsdata
Category ConsumerDataDataname LocationanalyticsdataDescription Hourly number of devices withWiFi enabled that pass
throughanareacoveredbyMeasurenceWiFisensorsProvider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 48
6.2.2 DatasetORIGIN
Thisdataset isavailablefromJanuary2017and itcannotbedefinedas“coredata”. IthasaAPIs-JSONformatwithasizeof~600GBandagrowthof~5GB/ location/month.Thedatasetalreadyexistedbeforetheproject.
Table12.DATASETORIGIN–Locationanalyticsdata
Availableat(M) M1CoreData(Y|N) NSize ~600GBGrowth ~5GB/location/monthTypeandformat APIs-JSONformatExistingdata(Y|N) YDataorigin Proprietarysensors
6.2.3 DatasetFORMAT
ThedatasethasaJSONandCSVformat.Itcollectsnumericaldatagatheredsince2015anditcoversinformationrelatedtozipcode,coordinates,address,county,city,country.Thedataisupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.
Table13DATASETFORMAT–Locationanalyticsdata
Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat JSONandCSVTimecoverage startingfrom2015Spatialcoverage zipcode,coordinates,address,county,city,countryLanguages EN(numericaldata)Identifiabilityofdata No.Rawdatacontainsahashedversionof therealmacaddresswhich is
anonymizedatthesourceNamingconvention /location_id/YYYY/MM/DD/HHVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A
6.2.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIbymeansofAuthenticatedencryptedchannel.
Table14MAKINGDATAACCESSIBLE–Locationanalyticsdata
Datasetlicense Owner:ME.Access:membersAvailability(public|private) private
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 49
AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIToolstoaccess AuthenticatedencryptedchannelDatasetsourceURL APIendpointAccessrestrictions Credentials/APIkeysKeyword/Tags presencedata,locationintelligenceArchivingandpreservation Lifetimearchiveofrawdata.TheAPIsalways
usethelastversionofthealgorithm
Table15MAKINGDATAINTEROPERABLE–Locationanalyticsdata
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
6.2.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpecteddatarecoveryandasecurestorage.
Table16DATASETSECURITY-Locationanalyticsdata
PersonalData(Y|N) NAnonymized(Y|N|NA) Y, prior to storing data in a database (No PD is
storedinanydatabase)Datarecoveryandsecurestorage YPrivacymanagementprocedures Allthedataanonymizedarebeforestorage(read
paragraph6.2.6)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevel of Aggregation (for PD anonymized byaggregation)
N/A
6.2.6 EthicsandLegalrequirements
The MAC addresses that Measurence's sensors collect (which can be unique identifiers of WiFitransmitters) are hashed with the cryptographic hash function SHA-2 256bits – which is a set ofcryptographic hash functions57 designed by the United States National Security Agency (NSA).Measurence followed a privacy by design approach, so after hashing has been performed, the
57https://simple.wikipedia.org/wiki/Cryptographic_hash_function
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 50
hashedMACaddressissenttoourserversandtheoriginalMACaddressgetsdiscardeddirectlybythesensor:weneverstoretherealmacaddressonourservers.GivenahashedMACaddressthereisnowaytoreconstructthecorrespondingoriginalMACaddress,otherthanattemptabruteforceattack (which, obviously, is applicable to any cryptographic function). Based on the abovedescription,thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframework that regulates the use of personal data does not apply and copy of opinion is notrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.3 MEDataset-ConsumerData:Locationanalyticsdata(Daily)
6.3.1 DatasetIDENTIFICATION
Thedataset“Locationanalyticsdata”,providedbyMeasurence,focusesondailynumberofdeviceswithWiFienabledthatpassthroughanareacoveredbyMeasurenceWiFisensors.
Table17.DATASETIDENTIFICATION–Locationanalyticsdata
Category ConsumerDataDataname LocationanalyticsdataDescription Daily number of devices with WiFi enabled that
passthroughanareacoveredbyMeasurenceWiFisensors
Provider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3
6.3.2 DatasetORIGIN
Thisdataset isavailablefromJanuary2017and itcannotbedefinedas“coredata”. IthasaAPIs-JSONformatwithasizeof~600GBandagrowthof~5GB/ location/month.Thedatasetalreadyexistedbeforetheproject.
Table18.DATASETORIGIN–Locationanalyticsdata
Availableat(M) M1CoreData(Y|N) NSize ~600GBGrowth ~5GB/location/month
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 51
Typeandformat APIs-JSONformatExistingdata(Y|N) YDataorigin Proprietarysensors
6.3.3 DatasetFORMAT
ThedatasethasaJSONandCSVformat.Itcollectsnumericaldatagatheredstartingfrom2015anditcovers information related to zip code, coordinates, address, county, city, country. The data isupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.
Table19DATASETFORMAT–Locationanalyticsdata
Datasetstructure N/AbecausethereisnoaccesstothedatathroughURL
Datasetformat JSONandCSV
Timecoverage startingfrom2015
Spatialcoverage zipcode,coordinates,address,county,city,country
Languages EN(numericaldata)
Identifiabilityofdata No. Raw data contains an hashed version of the realmac addresswhichisanonymizedatthesource
Namingconvention /location_id/YYYY/MM/DD/
Versioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)
Metadatastandards N/A
6.3.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIbymeansofAuthenticatedencryptedchannel.
Table20MAKINGDATAACCESSIBLE–Locationanalyticsdata
Datasetlicense Owner:ME.Access:membersAvailability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIToolstoaccess AuthenticatedencryptedchannelDatasetsourceURL TBD/APIendpointAccessrestrictions Credentials/APIkeysKeyword/Tags presencedata,locationintelligence
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 52
Archivingandpreservation Lifetime archive of raw data. The APIsalwaysusethelastversionofthealgorithm
Table21MAKINGDATAINTEROPERABLE–Locationanalyticsdata
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
6.3.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpecteddatarecoveryandasecurestorage.
Table22DATASETSECURITY-Locationanalyticsdata
PersonalData(Y|N) NAnonymized(Y|N|NA) Y,priortostoringdatainadatabase(NoPDisstored
inanydatabase)Datarecoveryandsecurestorage YPrivacymanagementprocedures All the data anonymized are before storage (read
paragraph6.3.6)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation)
N/A
6.3.6 EthicsandLegalrequirements
The MAC addresses that Measurence's sensors collect (which can be unique identifiers of WiFitransmitters) are hashed with the cryptographic hash function SHA-2 256bits – which is a set ofcryptographic hash functions designed by the United States National Security Agency (NSA).Measurence followed a privacy by design approach, so after hashing has been performed, thehashedMACaddressissenttoourserversandtheoriginalMACaddressgetsdiscardeddirectlybythesensor:weneverstoretherealmacaddressonourservers.GivenahashedMACaddressthereisnowaytoreconstructthecorrespondingoriginalMACaddress,otherthanattemptabruteforceattack (which, obviously, is applicable to any cryptographic function). Based on the abovedescription,thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframework that regulates the use of personal data does not apply and copy of opinion is notrequiredtobecollected.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 53
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.4 BBDataset-ConsumerData:CustomerPurchaseHistory
6.4.1 DatasetIDENTIFICATION
Thedataset “Customerpurchasehistory” isproprietaryandcontainsdataon customersand theirpurchases.
Table23.DATASETIDENTIFICATION–CustomerPurchaseHistory
Category ConsumerdataDataname CustomerpurchasehistoryDescription Sell outdatamatchedwith customerbaskets in a
definedtimeframe.Provider BigBangContactPerson MatijaTorlakBusinessCasesnumber BC1
6.4.2 DatasetORIGIN
ThisdatasetisavailablefromJanuary2017anditcannotbedefinedas“coredata”.Ithasasizeof29000productsandagrowthof2000newproductsperyear.Thedatasetalreadyexistedbeforetheproject.
Table24DATASETORIGIN–CustomerPurchaseHistory
Availableat(M) M1CoreData(Y|N) NSize 29000productsGrowth 2000newproductsperyearTypeandformat structuredtabulardataExistingdata(Y|N) YDataorigin Google Analytics, DWH (SQL tables), Excel
structureddata
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 54
6.4.3 DatasetFORMAT
The dataset has a CSV/XLS format. It collects data gathered since 2013 and it covers informationrelatedtototalorperstore location(18stores+web).Thedata isupdateddailyandcontainsthedatanewlygeneratedandhistory.
Table25DATASETFORMAT–CustomerPurchaseHistory
Datasetstructure BBClassification-canbematchedwithGPCClassification; purchase data table structured (SQL)
Datasetformat CSV/XLSTimecoverage since2013Spatialcoverage Totalorperstorelocation(18stores+web)Languages slovenianIdentifiabilityofdata YesNamingconvention /{country}/companyname/purchaseid.jsonVersioning daily(new+history)Metadatastandards GoogleAnalytics
6.4.4 DatasetACCESS
Thedatasetispublic,butitisaccessiblethroughpassword.Thedatawillbemadeavailablethroughdownload.
Table26MAKINGDATAACCESSIBLE–CustomerPurchaseHistory
Datasetlicense Admin-FullUser(Owner)
AccessallmembersthroughpassandusernameAvailability(public|private) public(password,usernamerestricted)AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Download,view,edit(basedonlicense)Toolstoaccess AccessibleonwebDatasetsourceURL BBvirtualserverAccessrestrictions CredentialsKeyword/Tags OrderId,ProductId,StoreId,….SameastheSampleArchivingandpreservation Canbegeneratedondemand
Table27MAKINGDATAINTEROPERABLE–CustomerPurchaseHistory
Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment
Standardvocabulary • Interlinkedproductclassification• Linkedproductdata
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 55
• Temporalontologies
6.4.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpectedsecurestorageandconstantdownloadoptions.
Table28DATASETSECURITY–CustomerPurchaseHistory
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,constantdownloadoptionsPrivacymanagementprocedures Personaldatawillnotbeprocessedduringtheproject.
All data are returned by analytics engine thatwill notprovidePD.
PDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevel of Aggregation (for PDanonymizedbyaggregation)
N/A
6.4.6 EthicsandLegalrequirements
ThesourceofthedatacontainsPD,butdataareanonymizedbeforetheprojectandsharedwithintheprojectwithoutPD.SinceBingBangalreadynotifiedtotheirDataProtectionOfficer(DPO)thattherewillbenoPDshared,theydon’tneedtogetadditionalopinion.NotificationtoDataProtectionOfficerisincludedindeliverable[D7.2].
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.5 BBDataset-ConsumerData:ConsumerIntentandInteraction
6.5.1 DatasetIDENTIFICATION
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 56
The dataset “Consumer intent and interaction” is proprietary and contains data on customerjourneysrecorderusingGoogleanalytics.
Table29.DATASETIDENTIFICATION–ConsumerIntentandInteraction
Category ConsumerdataDataname ConsumerintentandinteractionDescription A collection of user journey data from Google
Analytics - pageviews, page events, search terms,redirects to channels, etc. Data is recorded sinceDecember2012.
Provider BigBangContactPerson MatijaTorlakBusinessCasesnumber BC1
6.5.2 DatasetORIGIN
This dataset is available from January 2017 and it cannot be defined as “core data”. The datasetalreadyexistedbeforetheproject.
Table30DATASETORIGIN-ConsumerIntentandInteraction
Availableat(M) M1CoreData(Y|N) NSize 130millionpageviews,
20millionsessions,8millionusers,70000transactions(sinceDecember2012)
Growth 10.000 users per dayTypeandformat numericExistingdata(Y|N) YDataorigin GoogleAnalytics
6.5.3 DatasetFORMAT
ThedatasethasaCSVformat. Itcollectsdatagatheredsince2013anditregardsthewholeworld.Thedataisupdateddailyandcontainsthedatanewlygeneratedandhistory.
Table31DATASETFORMAT–ConsumerIntentandInteraction
Datasetstructure GoogleAnalyticsspecifiedDatasetformat CSV,XLSTimecoverage since2013Spatialcoverage Global
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 57
Languages notlanguagespecificIdentifiabilityofdata YesNamingconvention N/AVersioning daily(new+history)Metadatastandards GoogleAnalytics
6.5.4 DatasetACCESS
Thedatasetispublic,butitisaccessiblethroughpassword.Thedatawillbemadeavailablethroughdownload.
Table32MAKINGDATAACCESSIBLE–ConsumerIntentandInteraction
Datasetlicense Admin-FullUser(Owner)
AccessallmembersthroughpassandusernameAvailability(public|private) public(password,usernamerestricted)AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Download,view,edit(basedonlicense)Toolstoaccess N/ADatasetsourceURL BBvirtualserverAccessrestrictions CredentialsKeyword/Tags GooglesearchtagsArchivingandpreservation N/Abecausedataisusedjustforanalytical
Table33MAKINGDATAINTEROPERABLE–ConsumerIntentandInteraction
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Interlinkedproductclassification
• Linkedproductdata• Temporalontologies
6.5.5 DatasetSECURITY
Thedatasetdoesnot containpersonaldata. It is expected secure storageandconstantdownloadoptions.
Table34DATASETSECURITY–ConsumerIntentandInteraction
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,backupPrivacymanagementprocedures Google Analytics data only, so no PD included. In this case
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 58
dataisonthelevelofproduct/categories/page.PDatthesource(Y|N) NPD - anonymised during project(Y|N)
N
PD- anonymised before project(Y|N)
N
Level of Aggregation (for PDanonymizedbyaggregation)
N/A
6.5.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription,thedatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.6 MEDataset-ConsumerData:Locationanalyticsdata(Weekly)
6.6.1 DatasetIDENTIFICATION
The dataset “Location analytics data”, provided by Measurence, focuses on weekly number ofdeviceswithWiFienabledthatpassthroughanareacoveredbyMeasurenceWiFisensors.
Table35.DATASETIDENTIFICATION–Locationanalyticsdata
Category ConsumerDataDataname LocationanalyticsdataDescription WeeklynumberofdeviceswithWiFienabled that
passthroughanareacoveredbyMeasurenceWiFisensors
Provider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3
6.6.2 DatasetORIGIN
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 59
Thisdataset isavailablefromJanuary2017and itcannotbedefinedas“coredata”. IthasaAPIs-JSONformatwithasizeof~600GBandagrowthof~5GB/ location/month.Thedatasetalreadyexistedbeforetheproject.
Table36.DatasetORIGIN–Locationanalyticsdata
Availableat(M) M1CoreData(Y|N) NSize ~600GB
Growth ~5GB / location / month
Typeandformat APIs - JSON format
Existingdata(Y|N) Y
Dataorigin Proprietarysensors
6.6.3 DatasetFORMAT
ThedatasethasaJSONandCSVformat.Itcollectsnumericaldatagatheredstartingfrom2015anditcovers information related to zip code, coordinates, address, county, city, country. The data isupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.
Table37DATASETFORMAT–Locationanalyticsdata
Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat JSONandCSVTimecoverage startingfrom2015Spatialcoverage zipcode,coordinates,address,county,city,countryLanguages EN(numericaldata)Identifiabilityofdata No. Raw data contains a hashed version of the real mac address
whichisanonymizedatthesourceNamingconvention /location_id/YYYY/weeknumVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A
6.6.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIbymeansofAuthenticatedencryptedchannel.
Table38MAKINGDATAACCESSIBLE–Locationanalyticsdata
Datasetlicense Owner:ME.Access:membersAvailability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod API
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 60
Toolstoaccess AuthenticatedencryptedchannelDatasetsourceURL TBD/APIendpointAccessrestrictions Credentials/APIkeysKeyword/Tags presencedata,locationintelligenceArchivingandpreservation Lifetimearchiveof rawdata.TheAPIsalwaysuse
thelastversionofthealgorithm
Table39MAKINGDATAINTEROPERABLE–Locationanalyticsdata
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
6.6.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldatabecausethesedatawereanonymizedatthesource.Itisexpecteddatarecoveryandasecurestorage.
Table40DATASETSECURITY-Locationanalyticsdata
PersonalData(Y|N) NAnonymized(Y|N|NA) Y,prior tostoringdata inadatabase
(NoPDisstoredinanydatabase)Datarecoveryandsecurestorage YPrivacymanagementprocedures All the data anonymised are before
storage(readparagraph6.6.6)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
6.6.6 EthicsandLegalrequirements
The MAC addresses that Measurence's sensors collect (which can be unique identifiers of WiFitransmitters) are hashed with the cryptographic hash function SHA-2 256bits – which is a set ofcryptographic hash functions designed by the United States National Security Agency (NSA).Measurence followed a privacy by design approach, so after hashing has been performed, thehashedMACaddressissenttoourserversandtheoriginalMACaddressgetsdiscardeddirectlybythesensor:weneverstoretherealmacaddressonourservers.GivenahashedMACaddressthereisnowaytoreconstructthecorrespondingoriginalMACaddress,otherthanattemptabruteforceattack (which, obviously, is applicable to any cryptographic function). Based on the above
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 61
description,thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframework that regulates the use of personal data does not apply and copy of opinion is notrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.7 BTDataset-CustomerCommunicationData:ContactandConsumerInteractionHistory
6.7.1 DatasetIDENTIFICATION
The dataset “Contact and Consumer Interaction history” is proprietary and contains data oncommunicationswithcustomers.
Table41.DATASETIDENTIFICATION–ContactandConsumerInteractionHistory
Category CustomerCommunicationDataDataname ContactandConsumerInteractionHistoryDescription Thedatasetcontainsthefollowingdata:
• callso everyoutboundcall;successfulornot(everyattemptcounts)o everyinboundcall;successfulornoto everysimulatedcall
• othercontactseventso every inbound email, SMS, click-through, fax, scan, or any otherdocument
o everyoutboundemail,SMS,fax,oranyothersentdocument• othereventso arecordofagent'stimespentonwaitingforacontacto arecordofeverytimeanagentlogsinorouto arecordofeverytimeanagentjoinsorleavesacampaigno a record of every CCServer (CDE COCOS CEP Contact CenterServer)startuporshutdown
Usingthisdata,it ispossibletocreatestatisticsandreportsregardingtelephony and performance of single agents, groups of agents,campaignsandcallcenter.NearlyallthereportsprovidedbyCCServeraremadefromthistable.Althoughthistable isn'tmeanttoserveasabasisforcontentrelatedreports (i.e., interview statistics), there are some fields in the tablethatmaybeusedforthiskindofreportsaswell.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 62
Dataset data are either generated from the CCServer system orcollectedfromcollectedfromthecontactsignaling(protocol).The data are intended for handling the Customer EngagementPlatform (CEP) campaigns, theyarealreadyused for these intentionsandareinfutureintendedforthesamepurposes.Existing data is carrying all information about realized connectiontypes and services and will be reused and upgraded with newcommunicationchannels,trendsandservices.
Provider Browsetel/CDEContactPerson MatejŽvan
AlešŠtorBusinessCasesnumber BC1
6.7.2 DatasetORIGIN
ThisdatasetisavailablefromMarch2017anditcanbedefinedas“coredata”.Itssizeisof5-20GBwithagrowthof5-20GB/year.Thedatasetalreadyexistedbeforetheproject.
Table42DATASETORIGIN–ContactandConsumerInteractionHistory
Availableat(M) M3CoreData(Y|N) YSize 5-20GBGrowth 5-20GB/yearTypeandformat Current format is SQL, target format CSV UTF-8
Textfile(compressed)Existingdata(Y|N) YDataorigin Contact center and Customer Interaction
Managementdata
6.7.3 DatasetFORMAT
ThedatasethasCSVUTF8format.ItcoversinformationrelatedtoSloveniaareainEnglish.Thedataisupdatedmonthly.
Table43DATASETFORMAT–ContactandConsumerInteractionHistory
Datasetstructure RAWdata.OptimizedData fromthesystem“CallHistory” tableandhistory fromCustomerInteractionManagement.Recordsdescribingcontactscanbedescribedbyadditionalinformationrecords.EVENTIDCAMPAIGNRESULT_CCS
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 63
RESULT_CODECALL_PRIORITYATTEMPT_NRMANUAL_MODECCS_ENDSTATECOSTCONTACT_COUNTFOR_APPOINTMENTCALL_TYPECALL_DIRECTIONDISC_CAUSEDISC_CAUSE_DESCQUEUE_SIZEALL_QUEUE_SIZEDISC_BY_CUSTOMERCUSTOM_DATACALLED_NUMBERVRU_NUMBERTRANSFERSREJECTSIGNORES...CALL_REASONEVENT_SERVICE_ORIGINEVENT_ORIGINEVENT_TYPEEVENT_DATEEVENT_LOCATIONMEDIA_TYPETOTAL_TIMECONVERSATION_TIME
Datasetformat CSVUTF8Timecoverage 1year(atthestart),updatedduringtheprojectdurationSpatialcoverage SloveniaLanguages EnglishIdentifiabilityofdata Persistent and unique identifiers are used e.g. EVENT_ID,
CAMPAIGN_ID,CHANNEL_ID…Namingconvention NotusedVersioning MonthlyMetadatastandards Proprietarysolutioninformofrelationaltables
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 64
6.7.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablefromSecureFTPincompressedCSVUTF-8.
Table44MAKINGDATAACCESSIBLE–ContactandConsumerInteractionHistory
Datasetlicense No licencing for the timeof EWShoppproject duration.Access viaACLisenabledforallpartnersintheconsortium
Availability (public |private)
private
Availability to EW-Shopppartners(Y|N)
Y
Availabilitymethod DataavailablefromSecureFTPincompressedCSVUTF-8.Toolstoaccess SecureFTPClientDatasetsourceURL Browsetel,securefileserverAccessrestrictions CredentialsKeyword/Tags ContactsArchivingandpreservation Datawill be preserved for the time of EW Shopp project duration.
Endvolumeisapproximatedtobe20GB.
Table45MAKINGDATAINTEROPERABLE–ContactandConsumerInteractionHistory
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
6.7.5 DatasetSECURITY
ThedatasetdoesnotcontainPDbecausePDwasremovedatthesource.
Table46DATASETSECURITY–ContactandConsumerInteractionHistory
PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage NPrivacymanagementprocedures Callernumberisignoredandnotrecorded(notneeded
inanalyticalprocessing)PDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation)
N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 65
6.7.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription,thedatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.8 ECMWFDataset-Weather:MARSHistoricalData
6.8.1 DatasetIDENTIFICATION
Thedataset“MARSHistoricalData”isproprietaryandcontainsmeteorologicaldata.
Table47.DATASETIDENTIFICATION–MARSHistoricalData
Category WeatherDataname Meteorological Archival and Retrieval System
(MARS)HistoricalDataDescription Meteorologicalarchiveofforecastsofthepast35years
andsetsofreanalysisforecasts.Provider European Centre forMedium-RangeWeather Forecasts
(ECMWF)ContactPerson AljažKošmerljBusinessCasesnumber BC1,BC2,BC3,BC4
6.8.2 DatasetORIGIN
ThisdatasetisavailablefromApril2017anditcanbedefinedas“coredata”.Itssizeis>85PT.Thedatasetalreadyexistedbeforetheproject.
Table48DATASETORIGIN–MARSHistoricalData
Availableat(M) M4CoreData(Y|N) YSize >85PTGrowth CompletestatusofatmospheretwiceadayTypeandformat structured, CSVExistingdata(Y|N) YDataorigin ECMWFMARSAPI
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 66
6.8.3 DatasetFORMAT
ThedatasethasCSV format. It covers information related towholeearth inEnglish language.Thedataisupdatedreal-time.
Table49DATASETFORMAT–MARSHistoricalData
Datasetstructure N/ADatasetformat CSVTimecoverage past35yearsSpatialcoverage GlobalLanguages EnglishIdentifiabilityofdata YesNamingconvention /{country}/YYYY/MM/DD.CSVVersioning Real-timeMetadatastandards N/A
6.8.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablebyAPIaccess.
Table50MAKINGDATAACCESSIBLE–MARSHistoricalData
Datasetlicense Owner:ECMWF.Access:AllmembersAvailability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIaccessToolstoaccess RESTAPI,PythonAPIDatasetsourceURL http://apps.ecmwf.int/mars-catalogue/Accessrestrictions CredentialsKeyword/Tags weather,climateArchivingandpreservation ECMWFmaintainedarchive
Table51MAKINGDATAINTEROPERABLE–MARSHistoricalData
Datainteroperability • Semanticdataenrichment• References to shared systems of identifiers
andstandarddatatypesStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations• Wikipediaentities
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 67
6.8.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table52DATASETSECURITY–MARSHistoricalData
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage yes,bothmanagedbyECMWFPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A
6.8.6 EthicsandLegalrequirements
Based on the above dataset description, the dataset “MARS Historical Data” does not containpersonal data, therefore the national and European legal framework that regulates the use ofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.
6.9 CEDataset-ProductsandCategories:ProductAttributes
6.9.1 DatasetIDENTIFICATION
Thedataset“Productattributes”isproprietaryandcontainsinformationaboutindividualattributesforvariousproducts.
Table53.DATASETIDENTIFICATION–ProductAttributes
Category ProductsandcategoriesDataname ProductattributesDescription A collection of product attributes (varying from generic
such as name, EAN, brand, categorization and color tomore specific as dimensions or technical specifications).Data is collected from more than one thousand onlinestoresin5countriesandthenautomaticallyandmanuallymergedintoanorganizeddataset.
Provider CenejeContactPerson DavidCreslovnik
UrosMevc
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 68
BusinessCasesnumber BC1
6.9.2 DatasetORIGIN
ThisdatasetisavailablefromJanuary2017anditcanbedefinedas“coredata”.Thedatasetalreadyexistedbeforetheproject.
Table54DATASETORIGIN-ProductAttributes
Availableat(M) M1CoreData(Y|N) YSize 12millionproducts
10millionproductspecificationsGrowth 10000newproductsperday
7000productspecificationsperdayTypeandformat structuredtabulardataExistingdata(Y|N) YDataorigin SQLtables
6.9.3 DatasetFORMAT
Thedatasetcollectsdatastartingfrom2016andrelatedtoCountryinSlovenian,Croatian,Serbianlanguage.ThedataisupdatedDaily.
Table55DATASETFORMAT–ProductAttributes
Datasetstructure Productattributes-IdProduct(INT)-NameProduct(STRING)-L1(STRING)-L2(STRING)-L3(STRING)-AttName(STRING)-AttValue(STRING)
Datasetformat SQL:tabular
Timecoverage since2016
Spatialcoverage Country
Languages slovenian,croatian,serbian
Identifiabilityofdata Yes
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 69
Namingconvention /{country}/product_attributes.tsv
Versioning Daily(everydaythedatasetcontainsfullgenerateddata)
Metadatastandards N/A
6.9.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughFiledownload.
Table56MAKINGDATAACCESSIBLE–ProductAttributes
Datasetlicense Owner:Ceneje.Access:Allmembers
Availability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Filedownload(zip)Toolstoaccess WGET/CurlDatasetsourceURL AWSorCenejestaticcontentserverAccessrestrictions CredentialsKeyword/Tags N/AArchivingandpreservation NO(canbegeneratedondemand)
Table57MAKINGDATAINTEROPERABLE–ProductAttributes
Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment
Standardvocabulary • Interlinkedproductclassification• Linkedproductdata
6.9.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table58DATASETSECURITY–ProductAttributes
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,regularbackupsPrivacymanagementprocedures N/APDatthesource(Y|N) N
PD-anonymisedduringproject(Y|N) N
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 70
PD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) Productlevel
6.9.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription,thedataset“ProductAttributes”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.10 JSIDataset-Media:EventRegistry
6.10.1 DatasetIDENTIFICATION
Thedataset“EventRegistry” isproprietaryandcontainsclustered informationabouteventsbasedonnewsarticlesonline.
Table59.DATASETIDENTIFICATION–EventRegistry
Category DatasetMediaDataname EventRegistryDescription Aregistryofnewsarticleswhichareautomatically
clusteredintoevents-setsofarticlesaboutthesamereal-worldevent.Thearticlesarecollectedfromover150thousandsourcesfromallovertheworldandin21languages.Articletextisprocessedandannotatedusingalinguisticandsemanticanalysispipeline.Thearticlesandeventsarelinkedbasedoncontentsimilarity.Theselinksaremadeautomaticallyandacrossdifferentlanguages.
Provider JSIContactPerson AljažKošmerljBusinessCasesnumber BC1,BC2,BC3,BC4
6.10.2 DatasetORIGIN
ThisdatasetisavailablefromJanuary2017anditcanbedefinedas“coredata”.Thedatasetalreadyexistedbeforetheproject.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 71
Table60DATASETORIGIN–EventRegistry
Availableat(M) M1CoreData(Y|N) YSize 136millionarticlesand4.8millioneventsGrowth 150 thousand articles and 400 events added per
dayTypeandformat text + metadataExistingdata(Y|N) YDataorigin onlinenewssites,EventRegistryAPI
6.10.3 DatasetFORMAT
ThedatasetcollectsdatastartingfromDecember2013,relatedtowholeearthinmanylanguages.Thedataisupdatedreal-time.
Table61DATASETFORMAT–EventRegistry
Datasetstructure Full documentation available at: https://github.com/EventRegistry/event-registry-python/wiki/Data-models
Datasetformat JSONTimecoverage sinceDecember2013Spatialcoverage WholeEarthLanguages English, German, Spanish, Catalan, Portuguese, Italian, French, Russian,
Chinese, Slovene, Croatian, Serbian, Arabic, Turkish, Persian, Armenian,Kurdish,Lithuanian,Somali,Urdu,Uzbek
Identifiabilityofdata YNamingconvention WikipediaURIsVersioning Real-timeMetadatastandards N/A
6.10.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailablethroughAPIaccess.
Table62MAKINGDATAACCESSIBLE–EventRegistry
Datasetlicense Owner:JSIAccess:Allmembers
Availability(public|private) limited open and private (subscription-based); fullaccesswillbeavailabletoprojectmembers
AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod APIaccess
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 72
Toolstoaccess REST,PythonAPIDatasetsourceURL http://eventregistry.org/Accessrestrictions CredentialsKeyword/Tags news,articles,eventsArchivingandpreservation long-termdatabasestorage
Table63MAKINGDATAINTEROPERABLE–EventRegistry
Datainteroperability • Semanticdataenrichment• References to shared systems of identifiers
andstandarddatatypesStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations• Wikipediaentities
6.10.5 DatasetSECURITY
ThedatasetdoesnotincludePDcollecteddirectlyfromitsusers.ThedatasetcontainsonlypubliclyavailablePD (mentionsofnatural persons innewsarticles) aspartof its newsarchive. PD canberemoveduponrequestbyanyindividual.
Table64DATASETSECURITY–EventRegistry
PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage Securestorage,nosensitivedata,localbackupsPrivacymanagementprocedures "Righttobeforgotten"guaranteedPDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevel of Aggregation (for PD anonymized byaggregation)
N/A
6.10.6 EthicsandLegalrequirements
JSI has already obtained an opinion of the Slovenian Information Commissioner regarding use ofEventRegistrydatainanotherEUproject.(H2020projectRENOIR,grantagreementNo691152).Acopy of this opinion and an explanation why it is applicable also for the EW-Shopp project areincluded in deliverable [D7.2]. The opinion states that even though Event Registry collects andindexesnewsdatawhich ispubliclyavailable, itmaystill constituteasprocessingofpersonaldataandsomeusersmaywanttohavetheirdataremovedfromtheindex.Thisistheso-called“righttobeforgotten”whichmustalsobeofferedbywebsearchenginessuchasGoogle.Itcanbedefinedas
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 73
“the right to silence on past events in life that are no longer occurring” and allows individuals tohave information about themselves deleted from certain internet records so that they cannot befound by search engines. To comply with this, Event Registry supports the option to request aremovalofpersonallinksfromitsindex.TheInformationCommissionerdoesnotforeseeanyothernecessaryprivacyprotectionmeasures.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.11 GfKDataset-Consumerdata:Consumerdata
6.11.1 DatasetIDENTIFICATION
Thedataset“Consumerdata”isproprietaryandcontainsclusteredinformationabouteventsbasedonnewsarticlesonline.
Table65.DATASETIDENTIFICATION–Consumerdata
Category ConsumerdataDataname ConsumerdataDescription TVBehavior&Exposure,OnlineBehavior&Exposure,HH
&IndividualPurchaseLevel,MobileUsage,Household&IndividualDemographicandSegmentationInformationinItaly,Poland,NetherlandsandItaly.
Provider GfKContactPerson StefanoAlbanoBusinessCasesnumber BC2
6.11.2 DatasetORIGIN
ThedatasetisavailablefromMay2017anditcan’tbedefinedas“coredata”.Itssizeisof80GBwithagrowthof40GBperyear.Thedatasetalreadyexistedbeforetheproject.
Table66DATASETORIGIN–Consumerdata
Availableat(M) M5CoreData(Y|N) NSize 80GBGrowth 40GBperyear
Typeandformat structuredtabulardata,CSV
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 74
Existingdata(Y|N) YDataorigin GfK receive the data directly form the panelists
that are connected to GfK via GPRS technologywithanadhoctablet/viawebwithaPC/Laptop/viasmartphone.Dataarecollectedactively(withaquestionnaires) or passively (installed apps). Dataare anonymized and stored in GfK’s storagesystems.
6.11.3 DatasetFORMAT
ThedatasethasaCSVformat.Itcollectsnumericaldatasince2016anditcoversinformationrelatedtoItaly,Germany,Poland,Netherlands.Thedataisupdatedmonthly.
Table67DATASETFORMAT–Consumerdata
Datasetstructure Data are stored in data warehouse and can be extracted orvisualizedthroughasoftware.
Datasetformat structuredtabulardata,CSVTimecoverage Monthly/dailydatasince2016Spatialcoverage Italy,Germany,Poland,NetherlandsLanguages EN(numericaldata)Identifiabilityofdata N/ANamingconvention StaticDBVersioning MonthlyMetadatastandards N/A
6.11.4 DatasetACCESS
Thedatasetisprivateanditisnotavailabletoconsortiummembers.
Table68MAKINGDATAACCESSIBLE–Consumerdata
Datasetlicense AvailableonlyforGfKAvailability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) NAvailabilitymethod N/AToolstoaccess N/ADatasetsourceURL N/AAccessrestrictions N/AKeyword/Tags N/AArchivingandpreservation N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 75
Table69MAKINGDATAINTEROPERABLE–Consumerdata
Datainteroperability
• Semanticdataenrichment
Standardvocabulary
• Interlinkedproductclassification• Linkedproductdata• Temporalontologies• Spatialontologiesandlocations
6.11.5 DatasetSECURITY
ThedatasetdoesnotcontainPDbecausethosedatawasremovedatthesource.
Table70DATASETSECURITY–Consumerdata
PersonalData(Y|N) NAnonymized(Y|N|NA) YDatarecoveryandsecurestorage YPrivacymanagementprocedures See6.11.6PDatthesource(Y|N) YPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) YLevelofAggregation(forPDanonymizedbyaggregation) Dataarenotaggregated
6.11.6 EthicsandLegalrequirements
GfKcollectsthedataaccordingthecurrentPrivacylaw,askingeachpanelisttheconsenttotransferthe data toGfK for data analysis. GfK has performed notification to theNational Data ProtectionAuthority(attachedin[D7.2]).
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.12 GfKDataset-Marketdata:Salesdata
6.12.1 DatasetIDENTIFICATION
The dataset “Sales data” contains monthly data (in value / number) of Consumer Electronic,Information Technology, Telecommunication, Major Domestic Appliances and Small DomesticAppliancesproducts.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 76
Table71.DatasetIDENTIFICATION–Salesdata
Category MarketdataDataname SalesdataDescription Monthlydata(invalue/number)ofConsumer
Electronics,InformationTechnology,Telecommunication,MajorDomesticAppliancesandSmallDomesticAppliancesproducts.
Provider GfKContactPerson AlessandroDeFazioBusinessCasesnumber BC1,BC2,BC3
6.12.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.Itssizeisof80GBwithagrowthof5GBpercountryperyear.
Table72DATASETORIGIN–Salesdata
Availableat(M) M1CoreData(Y|N) NSize 80GBpercountryGrowth 5GBpercountryperyearTypeandformat structuredtabulardata,CSVExistingdata(Y|N) YDataorigin GfKreceivefromthePOSsalesdatasplitperproductindifferent
formats(electronicandmanual).Dataarechecked,verifiedanduploaded into a tool where the data are connected to theproductsheet.ThedataarecollectedonarepresentativesampleofPOSandareexplodedtotheuniverse.
6.12.3 DatasetFORMAT
ThedatasethasaCSVformat.Itcollectsdatasince2004relatedtoallEuropeancountries(except:Albania,Kosovo,MacedoniaandMontenegro).Thedataisupdatedmonthly.
Table73DATASETFORMAT–Salesdata
Datasetstructure Dataarestoredinaglobaldatawarehouseaccessibleonline.TheinputsarefourdimensionsProduct,Time,Facts,Channelsthatcanbeprocessedlikeanexcelpivottable.
Datasetformat structuredtabulardata,CSVTimecoverage Monthlydatasince2004Spatialcoverage AllEuropean(except:Albania,Kosovo,MacedoniaandMontenegro)
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 77
Languages EnglishIdentifiabilityofdata YesNamingconvention StaticDBVersioning MonthlyMetadatastandards N/A
6.12.4 DatasetACCESS
ThedatasetisprivateanditisavailableonlyforUniversitàBicocca.Thedataisavailablethroughftpbutusernameandpasswordarerequired.
Table74MAKINGDATAACCESSIBLE–Salesdata
Datasetlicense Thedatawillbe transferred toUniversitàBicocca fordataanalysiswhiletheanalysis(notthedata)willbetransferredbyUniversitàBicoccatotheconsortium.
Availability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod CSVfilesviaftpToolstoaccess NoDatasetsourceURL FTPAccessrestrictions usernameandpasswordneededtoaccessftpKeyword/Tags salesdataArchivingandpreservation N/A
Table75MAKINGDATAINTEROPERABLE–Salesdata
Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment
Standardvocabulary • Interlinkedproductclassification• Linkedproductdata• Temporalontologies• Spatialontologiesandlocations
6.12.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table76DATASETSECURITY–Salesdata
PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 78
Privacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
6.12.6 EthicsandLegalrequirements
Basedon theabovedatasetdescription, thedataset “SalesData”doesnot containpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.13 GfKDataset–Products&Categories:Productattributes
6.13.1 DatasetIDENTIFICATION
The dataset “Product attributes” contains Technical Product Data Sheets of all the products ofConsumer Electronics, IT, Telecommunication, Major domestic appliances, Small domesticAppliancessectors.
Table77.DATASETIDENTIFICATION–Productattributes
Category Products&CategoriesDataname ProductattributesDescription TechnicalProductDataSheetsofalltheproductsof
ConsumerElectronics,IT,Telecommunication,Majordomesticappliances,SmalldomesticAppliancessectors.ProductssheetsaredefinedwithintheGfKcategorizationandinclude:Brand,Productname,Model,ID,data,EANcode(on80%oftheproducts)andTechnicalfeatures.
Provider GfKContactPerson MarcoTobaldoBusinessCasesnumber BC1,BC2,BC4
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 79
6.13.2 DatasetORIGIN
Thedataset isavailablefromFebruary2017anditcanbedefinedas“coredata”. Itssize isof2GBpercountry(Germany,UK,Italy)withagrowthof2%peryear.
Table78DATASETORIGIN–Productattributes
Availableat(M) M2CoreData(Y|N) YSize 2GBpercountry(Germany,UK,Italy)Growth 2%peryearTypeandformat RelationalExistingdata(Y|N) YDataorigin GfKreceivethedataofallthesoldproductsinPOS.
When there is a new product GfK set its sheetgetting the features of the product from themanufacturer.Allthesheetsarecreatedmanually,accordingtheGfKdataplan, inthecountrywherethenewproducthasbeensold.
6.13.3 DatasetFORMAT
The dataset has a CSV or xml format. It collects product data since 1982 and has a Europeancoverage. The dataset is updated daily (every day the dataset contains only the data newlygenerated).
Table79DATASETFORMAT–Productattributes
Datasetstructure
We describe here the main structure of the relational database (RDB), bydescribingthefourCSVfilesthatweextractfromitandshareinEW-Shopp:Country_EWS_2017_12_31_Feature_Data.txt(Valueofthetechnicalfeaturesoftheproducts)Country_EWS_2017_12_31_Feature_List.txt(nameofthefeaturesoftheproducts)Country_EWS_2017_12_31_Feature_Value_List.txt(codeframeofthefeatures)Country_EWS_2017_12_31_Master_Data.txt (main information about theproducts)Country_EWS_2017_12_31_Productgroup_Feature_List.txt (list of the technicalfeaturesavailableforeachproduct)Each file contains several columns, thus for the complete structure we refer todocumentation in "Spex_retail_CSVrelationalidbased.pdf" shared with theconsortium.
Datasetformat
structured(R-DB),CSVoxml
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 80
Timecoverage
Thedatasetincludesproductdatasince1982anditisdailyupdated
Spatialcoverage
European coverage: Austria, Belgio, Danimarca, Finlandia, Francia, Germania, UK,Grecia, Italia, Lussemburgo, Olanda, Polonia, Portogallo, Repubblica ceca,Slovacchia, Italia, Svezia, , Norvegia, Ungheria. Catalog not available in Irlanda,Slovenia,Croazia.Bulgaria,Cipro,Estonia,Lettonia,Lituania,Malta,Romania,
Languages Arabic, Czech, Chinese, Korean, Danish, French, Greek, English, Italian, Dutch,Polish,Portuguese,Russian,Slovak,Spanish,Swedish,German,Turkish,Hungarian
Identifiabilityofdata
Yes
Namingconvention
Country_EWS_2017_12_31_Feature_Data.txtCountry_EWS_2017_12_31_Feature_List.txtCountry_EWS_2017_12_31_Feature_Value_List.txtCountry_EWS_2017_12_31_Master_Data.txtCountry_EWS_2017_12_31_Productgroup_Feature_List.txt
Versioning Daily (every day the dataset contains only the data newly generated).Overwriteolddata.
Metadatastandards
N/A
6.13.4 DatasetACCESS
Thedatasetisprivateanditisavailabletoallconsortiummembers.Thedataareavailablethroughftp.
Table80MAKINGDATAACCESSIBLE–Productattributes
Datasetlicense Private license: The data will be transferred toUniversità Bicocca for data analysis while the analysis(not thedata)will be transferredbyUniversitàBicoccatotheconsortium
Availability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod ftpToolstoaccess NotoolsDatasetsourceURL ItwillbecreatedwhenneededAccessrestrictions usernameandpasswordneededtoaccessftpKeyword/Tags productcategories/productfeatures/valueArchivingandpreservation Regulardisasterrecovery/backuponoriginaldata
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 81
Table81MAKINGDATAINTEROPERABLE–Productattributes
Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment
Standardvocabulary • Interlinkedproductclassification• Linkedproductdata
6.13.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table82DATASETSECURITY–Productattributes
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A
6.13.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription,thedataset“Productattributes”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.14 MEDataset-ConsumerData:Doorcounterdata
6.14.1 DatasetIDENTIFICATION
Thedataset“Doorcounterdata”containsdatafromcustomers'doorcounters.
Table83.DATASETIDENTIFICATION–Doorcounterdata
Category ConsumerDataDataname Doorcounterdata
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 82
Description Datafromcustomers'doorcountersProvider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC3
6.14.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.Itssizeisof2Mbwithagrowthof60kB/mb/location.Thedatasetalreadyexisted.
Table84DATASETORIGIN–Doorcounterdata
Availableat(M) M1CoreData(Y|N) NSize 2MbGrowth 60kB/mb/locationTypeandformat structureddataExistingdata(Y|N) YDataorigin Measurence'scustomersowndata
6.14.3 DatasetFORMAT
The dataset has a CSV format. It collects numerical data since 2016 related to Milan area. Thedatasetisupdateddaily.
Table85DATASETFORMAT–Doorcounterdata
Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat CSVTimecoverage 2016Spatialcoverage MilanLanguages EN(numericaldata)Identifiabilityofdata N/ANamingconvention /location_idYYYY/MM/weekVersioning DailyMetadatastandards N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 83
6.14.4 DatasetACCESS
Thedatasetisprivateanditisnotavailabletoallconsortiummembers.
Table86MAKINGDATAACCESSIBLE–Doorcounterdata
Datasetlicense Owner:ME.Availability(public|private) PrivateAvailabilitytoEW-Shopppartners(Y|N) NAvailabilitymethod CSVToolstoaccess texteditor/spreadsheetDatasetsourceURL N/AAccessrestrictions N/AKeyword/Tags doorcountersArchivingandpreservation cloud
Table87MAKINGDATAINTEROPERABLE–Doorcounterdata
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
6.14.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table88DATASETSECURITY–Doorcounterdata
PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage YPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N
6.14.6 EthicsandLegalrequirements
Thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthat regulates the use of personal data does not apply and copyof opinion is not required to becollected.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 84
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.15 BBDataset-ProductsandCategories:ProductAttributes
6.15.1 DatasetIDENTIFICATION
Thedataset“Productattributes”isproprietaryandcontainsdataonproductspecifications.
Table89.DATASETIDENTIFICATION–ProductAttributes
Category ProductsandcategoriesDataname ProductattributesDescription Detailedproductspecifications forproductswhich
are included in Big Bang's selling portfolio (fromgenerictospecifictechnicaldetails)
Provider BigBangContactPerson MatijaTorlakBusinessCasesnumber BC1
6.15.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcanbedefinedas“coredata”.Itssizeisof20000 productswithagrowthof1.000newproductsperyear.
Table90DATASETORIGIN–ProductAttributes
Availableat(M) M1CoreData(Y|N) YSize 20000productsGrowth 1.000newproductsperyearTypeandformat characterandnumericExistingdata(Y|N) YDataorigin DWH
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 85
6.15.3 DatasetFORMAT
ThedatasethasaXLSformat.ItcollectsdatarelatedtoSloveniainSlovenianandEnglishlanguages.Thedatasetisupdateddaily.
Table91DATASETFORMAT–ProductAttributes
Datasetstructure BBClassification-canbemostlymatchedwithGS1ClassificationDatasetformat XLS,SQL,CSVTimecoverage AllTimeSpatialcoverage SloveniaforallProductsLanguages Slovenian,EnglishIdentifiabilityofdata YesNamingconvention BB_productCategoriesYYYY/MM/ddVersioning daily(new+history)Metadatastandards N/A
6.15.4 DatasetACCESS
Thedataset is privatebut it is available to all consortiummembers. Thedata is available throughdownloadbymeansofVPN.
Table92MAKINGDATAACCESSIBLE–ProductAttributes
Datasetlicense Owner:BigBang.Access:AllmembersAvailability (public |private)
Public,restrictedwithcredentials
Availability to EW-Shopppartners(Y|N)
Y
Availabilitymethod Download,viewToolstoaccess URLwithCredentialsDatasetsourceURL URLlinksecuredwithCredentialsAccessrestrictions CredentialsKeyword/Tags DatabaseKeywordsArchivingandpreservation SecureStorage,Backup
Table93MAKINGDATAINTEROPERABLE–ProductAttributes
Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment
Standardvocabulary • Interlinkedproductclassification• Linkedproductdata
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 86
6.15.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table94DATASETSECURITY–ProductAttributes
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,dailybackupPrivacymanagementprocedures Data only on the level of product /
categoryPDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
6.15.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription,thedatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.16 CEDataset-Marketdata:Productspricehistory
6.16.1 DatasetIDENTIFICATION
Thedataset“Productspricehistory”isproprietaryandcontainsquotesforvariousproducts.
Table95.DATASETIDENTIFICATION–Productspricehistory
Category MarketdataDataname ProductspricehistoryDescription A collection of seller quotes for products. Prices for all of
Ceneje's organized products have been recorded andregularlyarchivedsince2016.
Provider CenejeContactPerson DavidCreslovnik
UrosMevc
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 87
BusinessCasesnumber BC1
6.16.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.Itssizeisabout3billionquoteswithagrowthof2millionperday.
Table96DATASETORIGIN-Productspricehistory
Availableat(M) M1CoreData(Y|N) NSize about3billionquotesGrowth 2millionperdayTypeandformat structuredtabulardataExistingdata(Y|N) YDataorigin SQLtables
6.16.3 DatasetFORMAT
ThedatasetcollectsdatarelatedtoCountryareasince2016.Thedatasetisupdateddaily.
Table97DATASETFORMAT–Productspricehistory
Datasetstructure History-IdProduct(INT)-NameProduct(STRING)-L1(STRING)-L2(STRING)-L3(STRING)-IdSeller(INT)-Price(MONEY)-Timestamp(SloveniantimeGMT+1)
Datasetformat SQL:tabular(tsv)Timecoverage since2016Spatialcoverage CountryLanguages notlanguagespecificIdentifiabilityofdata YesNamingconvention {country}/YYYY/mm/DD/history.tsvVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 88
6.16.4 DatasetACCESS
Thedatasetisprivatebutitisavailabletoallconsortiummembers.Thedataisavailablethroughfiledownload.
Table98MAKINGDATAACCESSIBLE–Productspricehistory
Datasetlicense Owner:CenejeAccess:Allmembers
Availability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod Filedownload(zip)Toolstoaccess WGET/CurlDatasetsourceURL N/AAccessrestrictions CredentialsKeyword/Tags N/AArchivingandpreservation N/A(canbegeneratedondemand)
Table99MAKINGDATAINTEROPERABLE–Productspricehistory
Datainteroperability • Publicationaslinkeddata(RDF-ization)• Semanticdataenrichment
Standardvocabulary • Interlinkedproductclassification• Linkedproductdata• Temporalontologies
6.16.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table100DATASETSECURITY–Productspricehistory
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Securestorage,regularbackupsPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) Product|Sellerlevel
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 89
6.16.6 EthicsandLegalrequirements
Based on the above dataset description, the dataset “Products price history” does not containpersonal data, therefore the national and European legal framework that regulates the use ofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.17 MEDataset-ConsumerData:Salesdata
6.17.1 DatasetIDENTIFICATION
Thedataset“Salesdata”containsnumberofreceiptsgetfromcustomers.
Table101.DATASETIDENTIFICATION–Salesdata
Category ConsumerDataDataname SalesdataDescription numberofreceiptswegetfromourcustomersProvider MeasurenceContactPerson OlgaMelnykBusinessCasesnumber BC2
6.17.2 DatasetORIGIN
Thedataset isavailable fromJanuary2017and itcan’tbedefinedas“coredata”. Its size isabout2Mbwithagrowthof60kB/mb/location.
Table102DATASETORIGIN-Salesdata
Availableat(M) M1CoreData(Y|N) NSize 2MbGrowth 60kB/mb/locationTypeandformat structureddataExistingdata(Y|N) YDataorigin Measurencecustomers'owndata
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 90
6.17.3 DatasetFORMAT
ThedatasetcollectsdatarelatedtoMilanareasince2016.Thedatasetisupdatedweekly.
Table103DATASETFORMAT–Salesdata
Datasetstructure N/AbecausethereisnoaccesstothedatathroughURLDatasetformat CSVTimecoverage 2016Spatialcoverage MilanLanguages EN(numericaldata)Identifiabilityofdata N/ANamingconvention /location_id/YYYY/MM/weekVersioning weeklyMetadatastandards N/A
6.17.4 DatasetACCESS
Thedatasetisprivateanditisnotavailabletoconsortiummembers.
Table104MAKINGDATAACCESSIBLE–Salesdata
Datasetlicense Owner:MEAvailability(public|private) privateAvailabilitytoEW-Shopppartners(Y|N) NAvailabilitymethod CSVToolstoaccess texteditor/spreadsheetDatasetsourceURL N/A because company’s dataset is not
availablethroughURLAccessrestrictions N/AKeyword/Tags receiptsArchivingandpreservation Cloud
Table105MAKINGDATAINTEROPERABLE–Salesdata
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 91
6.17.5 DatasetSECURITY
ThedatasetdoesnotcontainPD.
Table106DATASETSECURITY–Salesdata
PersonalData(Y|N) NAnonymized(Y|N|NA) NDatarecoveryandsecurestorage YPrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N
6.17.6 EthicsandLegalrequirements
Thisdatasetdoesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthat regulates the use of personal data does not apply and copyof opinion is not required to becollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalytics engine that provides only aggregated data about users grouped by specificcharacteristics,takingall thenecessarymeasures toavoiddiscrimination, stigmatization, limitationtofreeassociation,etc.
6.18 JOTDataset-Consumerdata:Trafficsource(Bing)
6.18.1 DatasetIDENTIFICATION
Thedataset “Traffic sources (Bing)”,providedby JOT, focusesonhistorical campaignperformancestatisticsofsearchdatainBingadvertisingplatforms.
Table107DATASETIDENTIFICATION–Trafficsource(Bing)
Category ConsumerDataDataname Trafficsources(Bing)Description Historical campaign performance statistics of
searchdatainBingadvertisingplatformsProvider JOTContactPerson IgnacioMartínez/ElíasBadenesBusinessCasesnumber BC4
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 92
6.18.2 DatasetORIGIN
This dataset is available from February 2017 and it cannot be defined as “core data”. It has astructuredformatwithasizeof1TBandagrowthof1.5GBdaily.Thedatasetisgeneratedexpresslyfortheproject’spurposeinCSVformat.
Table108DATASETORIGIN-Trafficsource(Bing)
Availableat(M) M2CoreData(Y|N) NSize 1TBGrowth 1.5GBdailyTypeandformat structured,CSVExistingdata(Y|N) NDataorigin BINGAPI
6.18.3 DatasetFORMAT
Thedataset“Trafficsource(Bing)”hasaCSVformat,thedatastructureisillustratedinthefollowingtable. It collects data gathered fromdifferent European countries, in different language (German,Spanish,French,English),since2016anditcovers informationrelatedtoCity/Region/Country.Thedataisupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.
Table109DATASETFORMAT–Trafficsource(Bing)
Datasetstructure Country:Countrywherethecampaignisoriented.Language:Languageofthekeywordsandads.Category:Topicof thekeyword.Wehave22categoriessuchasTravel,Finance,Vehiclesandsoforth58CampaignName:Anaccount is formbycampaigns.Thenameofthesecampaigncontainssomeinformationlikethelanguageorthecategory.AdgroupId:NumbergivenbyBingthatidentifyanadgroup.Acampaignisformbyadgroups.AdNetworkType2:Thenetworkwherekeywordsappear. ItcanbeBingsearch (the typical bing search engine in www.bing.com) or partnernetwork(otherwebpageswiththebingsearchbox).Clicks:Whenauserclicksyourad.Impressions:Eachtimeyouradisservedandappearsontheweb.Date:Date(XXXX/XX/XX)whentheadappears.DayOfWeek:Dayoftheweekwhentheadappears.Device:Thedevice(PC,Tablet,Mobile)wheretheadappears.
58 Taxonomy behind the categories used in BING can be found here:https://advertiseonbing.blob.core.windows.net/blob/bingads/media/library/o/blogpost/june%202015/bing_category_taxonomy.txt
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 93
MonthOfYear:Monthoftheyearwhentheadappears.Keyword:It’sthesearchthattheusertypes.Bing_posicion_anuncio (Bing_Ad_Position): Position of the ad in thebrowser.Location:City/Region/CountryConcordancia (Match type):Match typeof the keyword. It showshowsimilarneedstobethequeryofausertoshowanad
Datasetformat CSVTimecoverage since2016Spatialcoverage City/Region/CountryLanguages German,Spanish,French,EnglishIdentifiabilityofdata YesNamingconvention BING_YYMMDD_XXVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A
6.18.4 DatasetACCESS
The dataset is private, but it is accessible to all the consortiummembers. The datawill bemadeavailable throughFile-downloadbymeansof FTPClient.DatasetaredepositedonAzurePlatformandtheaccessisprovidedbycredentials.
Table110MAKINGDATAACCESSIBLE–Trafficsource(Bing)
Datasetlicense Owner:JOT.Access:AllmembersAvailability(public|private) privateAvailability to EW-Shopppartners(Y|N)
Yes
Availabilitymethod File-downloadToolstoaccess FTPClient(OpenSource)orWebPageDatasetsourceURL Azure platform. The URL will be created
whenneeded.Accessrestrictions CredentialsKeyword/Tags OnlineSearches(Keywords)Archivingandpreservation 5yearsafterprojectend
Table111MAKINGDATAINTEROPERABLE–Trafficsource(Bing)
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 94
6.18.5 DatasetSECURITY
Thedataset“Traffic source (Bing)”doesnotcontainpersonaldata. It isexpectedasecurestorageandJOTdatarecovery.
Table112DATASETSECURITY–Trafficsource(Bing)
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Secure storage, no sensitive data, JOT
datarecoveryPrivacymanagementprocedures N/APDatthesource(Y|N) N/APD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A
6.18.6 EthicsandLegalrequirements
All the data that JOT Internet is generating, sharing and processing (in compliance with SpanishOrganicLaw15/1999forpersonaldataprotection,ISO/IEC2382-1andtheGeneralDataProtectionRegulation (GDPR)) for thepurposeofEWShoppprojectdoesnot includepersonaldata. For thatreason,JOTbelievethatdatamanagedintheprojectdoesnotincludeanypersonaldataandthatiswhynofurtheractionisneeded.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.
6.19 JOTDataset-Consumerdata:Trafficsource(Google)
6.19.1 DatasetIDENTIFICATION
Thedataset“Trafficsources(Google)”,providedbyJOT,focusesonhistoricalcampaignperformancestatisticsofsearchdatainGoogleplatforms.
Table113DATASETIDENTIFICATION–Trafficsource(Google)
Category ConsumerDataDataname Trafficsources(Google)Description Historical campaignperformance statisticsofdata
inGoogleplatform.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 95
Provider JOTContactPerson IgnacioMartínez/ElíasBadenesBusinessCasesnumber BC4
6.19.2 DatasetORIGIN
The dataset is available from February 2017 and it is defined as “core data”. It has a structuredformat(i.e.CSV)withasizeupto3TBandagrowthof4GBdaily.Thedatasetisgeneratedexpresslyfortheproject’spurpose.
Table114DATASETORIGIN-Trafficsource(Google)
Availableat(M) M2CoreData(Y|N) YSize >3TBGrowth 4GBdailyTypeandformat structured,CSVExistingdata(Y|N) NDataorigin GOOGLEAPI
6.19.3 DatasetFORMAT
The dataset “Traffic source (Google)” has a CSV format. It collects data gathered from differentcountries, in different language (German, Spanish, Italian, Dutch, French, English, Portuguese,Russian),since2016anditcovers informationrelatedtoCity/Region/Country.Thedata isupdateddailythatmeanseverydaythedatasetcontainsonlythedatanewlygenerated.Thedatastructureisillustratedinthefollowingtable.
Table115DATASETFORMAT–Trafficsource(Google)
Datasetstructure Country:Countrywherethecampaignisoriented.Language:Languageofthekeywordsandads.Category: Topic of the keyword. We have 22 categories such as Travel,Finance,Vehiclesandsoforth59Campaign Name: An account is form by campaigns. The name of thesecampaigncontainssomeinformationlikethelanguageorthecategory.AdgroupId:NumbergivenbyGooglethatidentifyanadgroup.Acampaignisformbyadgroups.AdNetworkType2: The network where keywords appear. It can be Googlesearch (the typical google search engine in www.google.com) or partner
59 Taxonomy behind the categories used in Google can be found here:https://support.google.com/ads/answer/2842480?hl=en
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 96
network(otherwebpageswiththegooglesearchbox).Clicks:Whenauserclicksyourad.Impressions:Eachtimeyouradisservedandappearsontheweb.Date:Date(XXXX/XX/XX)whentheadappears.DayOfWeek:Dayoftheweekwhentheadappears.Device:Thedevice(PC,Tablet,Mobile)wheretheadappears.MonthOfYear:Monthoftheyearwhentheadappears.Keyword:It’sthesearchthattheusertypes.Google_posicion_anuncio (Google_Ad_Position): Position of the ad in thebrowser.Location:City/Region/CountryConcordancia(Matchtype):Matchtypeofthekeyword.Itshowshowsimilarneedstobethequeryofausertoshowanad
Datasetformat CSVTimecoverage since2016Spatialcoverage City/Region/CountryLanguages German,Spanish,Italian,Dutch,French,English,Portuguese,RussianIdentifiability ofdata
Yes
Namingconvention GOOGLE_YYMMDD_XXVersioning Daily(everydaythedatasetcontainsonlythedatanewlygenerated)Metadatastandards N/A
6.19.4 DatasetACCESS
The dataset is private but it is accessible to all the consortiummembers. The datawill bemadeavailable throughFile-downloadbymeansof FTPClient.DatasetaredepositedonAzurePlatformandtheaccessisprovidedbycredentials.
Table116MAKINGDATAACCESSIBLE–Trafficsource(Google)
Datasetlicense Owner:JOT.Access:AllmembersAvailability(public|private) privateAvailability to EW-Shopppartners(Y|N)
Yes
Availabilitymethod File-downloadToolstoaccess FTPClient(OpenSource)orWebPageDatasetsourceURL Azure platform. The URL will be created
whenneeded.Accessrestrictions CredentialsKeyword/Tags OnlineSearches(Keywords)Archivingandpreservation 5yearsafterprojectend
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 97
Table117MAKINGDATAINTEROPERABLE–Trafficsource(Google)
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations
6.19.5 DatasetSECURITY
Thedataset“Trafficsource(Google)”doesnotcontainpersonaldata.ItisexpectedasecurestorageandJOTdatarecovery.
Table118DATASETSECURITY–Trafficsource(Google)
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Secure storage, no sensitive data, JOT
datarecoveryPrivacymanagementprocedures N/APDatthesource(Y|N) N/APD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A
6.19.6 EthicsandLegalrequirements
All the data that JOT Internet is generating, sharing and processing (in compliance with SpanishOrganicLaw15/1999forpersonaldataprotection,ISO/IEC2382-1andtheGeneralDataProtectionRegulation (GDPR)) for thepurposeofEWShoppprojectdoesnot includepersonaldata. For thatreason,JOTbelievethatdatamanagedintheprojectdoesnotincludeanypersonaldataandthatiswhynofurtheractionisneeded.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 98
6.20 JOTDataset-Marketdata:Twittertrends
6.20.1 DatasetIDENTIFICATION
The dataset “Twitter Trends” is Open data and focuses on trending topics as available throughTwitterAPIs.
Table119DATASETIDENTIFICATION–Twittertrends
Category TwitterTrendsDataname MarketdataDescription TrendingtopicsasavailablethroughTwitterAPIsProvider OpenDataContactPerson IgnacioMartínez/ElíasBadenesBusinessCasesnumber BC4
6.20.2 DatasetORIGIN
Thedataset“TwitterTrends”isavailablefromMay2017anditcannotbedefinedas“coredata”.Ithas a structured formatwith a growth of 10MB daily. The dataset is generated expressly for theproject’spurpose.
Table120DATASETORIGIN–Twittertrends
Availableat(M) M5CoreData(Y|N) NSize N/AGrowth 50 trending topic / every 15min / country (10MB
daily)Typeandformat structured,CSVExistingdata(Y|N) NDataorigin TwitterAPI
6.20.3 DatasetFORMAT
Thedataset“Twittertrends”hasaCSVformat,thedatastructureisillustratedinthefollowingtable.The dataset does not dependon language. Its spatial coverage is the country and it collects datasinceMay2017.Thedataisupdateddaily.
Table121DATASETFORMAT–Twittertrends
Datasetstructure Location:Countryofthehashtag.Date:Dayofthelist.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 99
Hashtag:Nameofthehashtag.Promoted_Content:Showsisahashtagispromotedornot.Tweets_Volume:Numberoftweetsofahashtag.Relevance:Hashtag'sposition.
Datasetformat CSVTimecoverage M5Spatialcoverage CountryLanguages N/AIdentifiabilityofdata YesNamingconvention TWITTER_YYMMDD_XXVersioning DailyMetadatastandards N/A
6.20.4 DatasetACCESS
The dataset is private but it is accessible to all the consortiummembers. The datawill bemadeavailable throughFile-downloadbymeansof FTPClient.DatasetaredepositedonAzurePlatformandtheaccessisprovidedbycredentials.
Table122MAKINGDATAACCESSIBLE–Twittertrends
Datasetlicense Owner:JOT.Access:AllmembersAvailability(public|private) privateAvailability to EW-Shopppartners(Y|N)
Yes
Availabilitymethod File-downloadToolstoaccess FTPClient(OpenSource)orWebPageDatasetsourceURL Azure platform. The URL will be created
whenneeded.Accessrestrictions CredentialsKeyword/Tags HashtagsArchivingandpreservation 5yearsafterprojectend
Standardvocabularyortaxonomyisnotavailablefor“Twittertrends”dataset.
Table123MAKINGDATAINTEROPERABLE–Twittertrends
Datainteroperability • SemanticdataenrichmentStandardvocabulary • Temporalontologies
• Spatialontologiesandlocations• Wikipediaentities
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 100
6.20.5 DatasetSECURITY
Thedataset“Twittertrends”doesnotcontainpersonaldata.ItisexpectedasecurestorageandJOTdatarecovery.
Table124DATASETSECURITY–Twittertrends
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage Secure storage, no sensitive data, JOT
datarecoveryPrivacymanagementprocedures N/APDatthesource(Y|N) N/APD-anonymisedduringproject(Y|N) N/APD-anonymisedbeforeproject(Y|N) N/ALevelofAggregation(forPDanonymizedbyaggregation) N/A
6.20.6 EthicsandLegalrequirements
All the data that JOT Internet is generating, sharing and processing (in compliance with SpanishOrganicLaw15/1999forpersonaldataprotection,ISO/IEC2382-1andtheGeneralDataProtectionRegulation (GDPR)) for thepurposeofEWShoppprojectdoesnot includepersonaldata. For thatreason,JOTbelievethatdatamanagedintheprojectdoesnotincludeanypersonaldataandthatiswhynofurtheractionisneeded.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.Alldataarereturnedbyanalyticsenginethatprovidesonlyaggregateddataaboutusersgroupedbyspecificcharacteristics,taking all the necessary measures to avoid discrimination, stigmatization, limitation to freeassociation,etc.
6.21 LODDataset-Geographic:DBpedia
6.21.1 DatasetIDENTIFICATION
Thedataset“DBpedia”ispubliclyavailableandcontainsfactualinformationfromdifferentareasofhumanknowledgeextractedfromWikipediapages.
Table125.DATASETIDENTIFICATION–DBpedia
Category GeographicDatasetDataname DBpediaDescription DBpediaisacrowd-sourcedcommunityefforttoextract
structured information fromWikipediaand make this
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 101
informationavailableontheWeb.TheEnglishversionofthe DBpedia knowledge base describes 4.58 millionthings, out of which 4.22 million are classified in aconsistent ontology, including 1,445,000 persons,735,000 places (including 478,000 populated places),411,000creativeworks(including123,000musicalbums,87,000 films and 19,000 video games), 241,000organizations (including 58,000 companies and 49,000educational institutions), 251,000 species and 6,000diseases
Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4
6.21.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.
Table126DATASETORIGIN–DBpedia
Availableat(M) M1CoreData(Y|N) NSize 735,000places(including478,000populatedplaces)Growth Notafixednumber,e.g,Dbpedia3.82.8GB,Dbpedia
3.9 2.4GB, while DBpedia2015-04 4.7GB.More infohttp://wiki.dbpedia.org/downloads-2016-04
Typeandformat rdf, tuples
Existingdata(Y|N) YDataorigin http://wiki.dbpedia.org/datasets
6.21.3 DatasetFORMAT
ThedatasethasaworldwidecoverageandcollectsdatasinceOctober2016in125languages.
Table127DATASETFORMAT–DBpedia
Datasetstructure provides data in n-triple format (<subject> <predicate> <object> .)
Datasetformat .ttl, .qtl
Timecoverage up to 10/2016
Spatialcoverage Global
Languages Localized versions of DBpedia in 125 languages. English, German,Spanish, Catalan, Portuguese, Italian, French, Russian, Chinese,Slovenian,Croatian,Serbian,Arabic,Turkish,etc.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 102
Identifiabilityofdata NoNamingconvention dbpedia_version/yearVersioning No
Metadatastandards Yes:DBO,FOAF,SCHEMA.ORG,SKOS,etc.
6.21.4 DatasetACCESS
Thedatasetispublicanditisaccessibletoalltheconsortiummembers.
Table128MAKINGDATAACCESSIBLE–DBpedia
Datasetlicense GNUFreeDocumentationLicense.Availability(public|private) Public
AvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod SPARQLENDPOINT,DUMPToolstoaccess webservice(REST/SOAPAPIs),queryendpointDatasetsourceURL http://wiki.dbpedia.org/datasetsAccessrestrictions NoaccessrestrictionKeyword/Tags cross-domain: places, person, films, food,
music,historyetc.Archivingandpreservation N/A
Table129MAKINGDATAINTEROPERABLE–DBpedia
Datainteroperability • N/A(LinkedOpenData)Standardvocabulary • Temporalontologies
• Spatialontologiesandlocations• Wikipediaentities
6.21.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldata.
Table130DATASETSECURITY–DBpedia
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 103
6.21.6 EthicsandLegalrequirements
Based on the above dataset description, the dataset “DBpedia” does not contain personal data,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.
6.22 LODDataset-Geographic:LinkedOpenStreetMaps
6.22.1 DatasetIDENTIFICATION
Thedataset“LinkedOpenStreetMaps”ispubliclyavailableandcontainseditablemapofthewholeworld.
Table131.DATASETIDENTIFICATION–LinkedOpenStreetMaps
Category GeographicDatasetDataname LinkedOpenStreetMapsDescription OpenStreetMapisbuiltbyacommunityofmappersthat
contribute andmaintain data about roads, trails, cafés,railwaystations,andmuchmore,allovertheworld.
Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4
6.22.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.
Table132DATASETORIGIN–LinkedOpenStreetMaps
Availableat(M) M1CoreData(Y|N) NSize 5,027,330,590GPSpointsGrowth NotafixednumberTypeandformat DatanormallycomesintheformofXMLformattedOSMfilesExistingdata(Y|N) YDataorigin http://planet.openstreetmap.org/planet/planet-
latest.osm.bz2
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 104
6.22.3 DatasetFORMAT
Thedatasethasaworldwidecoverageandcollectsdatainalllanguages.
Table133DATASETFORMAT–LinkedOpenStreetMaps
Datasetstructure XML
Datasetformat ThetwomainformatsusedarePBForcompressedOSMXML.PBFisa binary format that is smaller to download and much faster toprocess and should be used when possible. Most common toolsusingOSMdatasupportPBF.
Timecoverage uptodate
Spatialcoverage Worldwide.Allthenodes,waysandrelationsthatmakeupourmap
Languages Alllanguages
Identifiabilityofdata No
Namingconvention N/A
Versioning Eachweek,anewandcompletecopyofalldatainOpenStreetMapismade available as both a compressed XML file and a custom PBFformatfile.Alsoavailableisthe'history'file,whichcontainsnotonlyup-to-date data but also older versions of data and deleted dataitems.
Metadatastandards Yes:DBO,FOAF,SCHEMA.ORG,SKOS,etc.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 105
6.22.4 DatasetACCESS
Thedatasetispublicanditisaccessibletoalltheconsortiummembers.
Table134MAKINGDATAACCESSIBLE–LinkedOpenStreetMaps
Datasetlicense OpenStreetMapisopen data, licensed under theOpenData Commons Open Database License(ODbL) bytheOpenStreetMapFoundation(OSMF).
Availability(public|private) PublicAvailability to EW-Shopp partners(Y|N)
Y
Availabilitymethod dump,keywordbasedToolstoaccess API/dump,SPARQLwrapperDatasetsourceURL http://wiki.openstreetmap.org/wiki/Use_OpenStreetMapAccessrestrictions NoaccessrestrictionKeyword/Tags cities,towns,places,municipalities,etc.Archivingandpreservation N/A
Table135MAKINGDATAINTEROPERABLE–LinkedOpenStreetMaps
Datainteroperability • N/A(LinkedOpenData)Standardvocabulary • Temporalontologies
• Spatialontologiesandlocations• Wikipediaentities
6.22.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldata.
Table136DATASETSECURITY–LinkedOpenStreetMaps
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 106
6.22.6 EthicsandLegalrequirements
Basedon theabovedatasetdescription, thedataset “LinkedOpenStreetMaps”doesnot containpersonal data, therefore the national and European legal framework that regulates the use ofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.
6.23 LODDataset-Geographic:LinkedGeoData
6.23.1 DatasetIDENTIFICATION
Thedataset“LinkedGeoData”ispubliclyavailableandcontainsgeographicinformationforplaces,cities,countries,etc..
Table137.DATASETIDENTIFICATION–LinkedGeoData
Category GeographicDatasetDataname LinkedGeoDataDescription LinkedGeoDataisan effort toadd aspatial dimension
totheWebofData/SemanticWeb.LinkedGeoDatausestheinformationcollectedbytheOpenStreetMapprojectandmakes itavailable asan RDFknowledge baseaccording tothe LinkedData principles. Itinterlinks thisdata with other knowledge bases inthe Linking OpenDatainitiative.
Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4
6.23.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.
Table138DATASETORIGIN–LinkedGeoData
Availableat(M) M1CoreData(Y|N) NSize 8,3GB
Growth NotafixednumberTypeandformat .nt
Existingdata(Y|N) Y
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 107
Dataorigin http://downloads.linkedgeodata.org/releases/
6.23.3 DatasetFORMAT
ThedatasetcollectsdatasinceNovember2015inEnglish.
Table139DATASETFORMAT–LinkedGeoData
Datasetstructure N-triplesDatasetformat .ntTimecoverage uptonovember2015Spatialcoverage Itconsists ofmore than 3 billion nodes and300 million ways andthe
resultingRDFdatacomprisesapproximately20billion triples.ThedataisavailableaccordingtotheLinkedDataprinciplesandinterlinkedwithDBpediaandGeoNames.
Languages EnglishIdentifiabilityofdata NoNamingconvention N/AVersioning NoversioningMetadatastandards Linkedopengeovocabulary
6.23.4 DatasetACCESS
Thedatasetispublicanditisaccessibletoalltheconsortiummembersthroughdump.
Table140MAKINGDATAACCESSIBLE–LinkedGeoData
Datasetlicense ODbL
Availability(public|private) PublicAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod dump,Toolstoaccess dumpDatasetsourceURL http://downloads.linkedgeodata.org/releases/Accessrestrictions NoaccessrestrictionKeyword/Tags cities,towns,places,municipalities,etcArchivingandpreservation N/A
Table141MAKINGDATAINTEROPERABLE–LinkedGeoData
Datainteroperability • N/A(LinkedOpenData)
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 108
Standardvocabulary • Temporalontologies• Spatialontologiesandlocations• Wikipediaentities
6.23.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldata.
Table142DATASETSECURITY–LinkedGeoData
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
6.23.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription, thedataset“LinkedGeoData”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.
6.24 LODDataset-Geographic:GeoNames
6.24.1 DatasetIDENTIFICATION
Thedataset“GeoNames”ispubliclyavailableandcontainsgeographicinformationforplaces,cities,countries,etc.
Table143.DATASETIDENTIFICATION–GeoNames
Category GeographicDatasetDataname GeoNamesDescription The GeoNames geographical database is available for
download free of charge under a creative commonsattribution license. It contains over 10 milliongeographicalnamesandconsistsofover9millionuniquefeatures whereof 2.8 million populated places and 5.5
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 109
millionalternatenames.Allfeaturesarecategorizedintoone out of nine feature classes and furthersubcategorizedintooneoutof645featurecodes.
Provider LOD-AccessfacilitatedbyUNIMIBContactPerson AndreaMaurinoBusinessCasesnumber BC1,BC2,BC3,BC4
6.24.2 DatasetORIGIN
ThedatasetisavailablefromJanuary2017anditcan’tbedefinedas“coredata”.
Table144DATASETORIGIN–GeoNames
Availableat(M) M1CoreData(Y|N) NSize 10.6GBzippedGrowth NotafixednumberTypeandformat RDFExistingdata(Y|N) YDataorigin https://drive.google.com/file/d/0B1tUDhWNTjO-
WEZZb2VwOG5vZkU/edit?usp=sharing/
6.24.3 DatasetFORMAT
Thedatasetcollectsdatarelatedtoallcountries.
Table145DATASETFORMAT–GeoNames
Datasetstructure RDFDatasetformat RDFTimecoverage uptodateSpatialcoverage Allcountriesandpointsindegree(long&lat)Languages EnglishIdentifiabilityofdata NoNamingconvention NoVersioning dailydumpMetadatastandards geonamesvocab
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 110
6.24.4 DatasetACCESS
Thedatasetispublicanditisaccessibletoalltheconsortiummembersthroughdump.
Table146MAKINGDATAACCESSIBLE–GeoNames
Datasetlicense CC-BY3.060Availability(public|private) PublicAvailabilitytoEW-Shopppartners(Y|N) YAvailabilitymethod dump,Toolstoaccess dumpDatasetsourceURL http://download.geonames.org/export/dump/Accessrestrictions NoaccessrestrictionKeyword/Tags cities,towns,places,municipalities,etc.Archivingandpreservation N/A
Table147MAKINGDATAINTEROPERABLE–GeoNames
Datainteroperability • N/A(LinkedOpenData)Standardvocabulary • Temporalontologies
• Spatialontologiesandlocations• Wikipediaentities
6.24.5 DatasetSECURITY
Thedatasetdoesnotcontainpersonaldata.
Table148DATASETSECURITY–GeoNames
PersonalData(Y|N) NAnonymized(Y|N|NA) N/ADatarecoveryandsecurestorage N/APrivacymanagementprocedures N/APDatthesource(Y|N) NPD-anonymisedduringproject(Y|N) NPD-anonymisedbeforeproject(Y|N) NLevelofAggregation(forPDanonymizedbyaggregation) N/A
60https://creativecommons.org/licenses/by/3.0
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 111
6.24.6 EthicsandLegalrequirements
Basedontheabovedatasetdescription, thedataset“GeoNames”doesnotcontainpersonaldata,thereforethenationalandEuropeanlegalframeworkthatregulatestheuseofpersonaldatadoesnotapplyandcopyofopinionisnotrequiredtobecollected.
Therearenoethicalissuesthatcanhaveanimpactonsharingthisdataset.
6.25 MappingbetweenDatasetandBusinesscase
Inthefollowingtableitispossibletoseewhichareallthedatasetsthatrefertoabusinesscase.
Table149MappingDatasetandBusinesscase
id Datasetname Provider BC1 BC2 BC3 BC41 Purchaseintent Ceneje X 2 Locationanalyticsdata(hourly) Measurence X 3 Locationanalyticsdata(daily) Measurence X 4 CustomerPurchaseHistory BigBang X 5 ConsumerIntentandInteraction BigBang X 6 Locationanalyticsdata(Weekly) Measurence X 7 ContactandConsumerInteraction
HistoryBrowsetel X
8 MARS(historicaldata) ECMWF X X X X9 Productattributes Ceneje X 10 EventRegistry JSI X X X X11 Consumerdata GfK X 12 Salesdata GfK X X X 13 Productattributes GfK X X X14 Doorcounterdata Measurence X 15 Productattributes BingBang X 16 Productspricehistory Ceneje X 17 Salesdata Measurence X 18 Trafficsources(Bing) JOT X19 Trafficsources(Google) JOT X20 TwitterTrends JOT X21 Dbpedia LOD X X X X22 LinkedOpenStreetMaps LOD X X X X23 LinkedGeoData LOD X X X X24 GeoNames LOD X X X X
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 112
Chapter7 StorageandRe-use
7.1 Storage
DataintheEW-Shoppwillbeexchangedandmadeavailablethroughatwo-tierstoragepolicy.Thepolicywillconsistof:
• Tier1:ashareddataspaceforexchangingrawinputdatabetweenConsortiumpartners.• Tier2:structureddatastoragewithintegrateddatabasedontheDataGraftplatform,which
willbeusedtoproducetheintegrateddataaccordingtoashareddatamodel.
Tier 1 will be implemented using a file or data sharing solution. It will use cloud hostinginfrastructure services toenableeasyaccessover theweb.Datawill be storedusingdatahostingserviceandsecuredatasharingprotocolstoensurethatdataarenotcompromised.
Tier2willbe implementedbasedontheDataGraftplatformwheretheshareddatamodelwillbepublishedandtheoutputdatawillbeimportedinadatabasemanagementsystemandregisteredinthecatalogue,takingintoaccounttheuseraccessrestrictionsforeachdataset.
7.2 BackupandRecovery
Back-upandrecoverymechanismswillbeimplementedonacasebycasebasiswithrespecttoeachoutputdatasets.Inputdatasetshavealreadyback-upandrecoveryinplace(whenneeded)andaredirectlymanagedbythedataproviders;therefore,nobackupand/orrecoverymechanismforinputdatasetsfallswithinthescopeoftheEW-Shoppplatform.
Theconcretedataback-upandrecoverymechanismstobeadoptedatEW-Shoppplatformlevelwillbe discussed in the future versions of theDataManagement Plan as they evolve throughout theproject, or inotherdeliverablesdealingwith technical aspects (suchas thedetaileddesignof theplatformorthebusinesscasesimplementationplans).
7.3 DataArchiving
Thedatausedandproducedduringtheprojectdevelopmentwillbeupdatedeachtimetheychangein project lifetime. For each dataset update, a reference document will also be produced. Thisdocumentwillreportthechangesofthedatasetrespecttopreviousversion.
EW-Shoppdatasetsusedinthedemonstratorwillbemaintainedforatleastfiveyearsafterprojecttermination. Sensitive data preservationwill follow the guidelines that EW-Shopp consortiumwillprovideduringtheprojectdevelopment.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 113
7.4 Security
The EW-Shopp frameworkwill ensure the secure storage and exchange of data in the project toprotectagainstcompromisingofsensitivedata.OneofthemaincomponentsthatwillbeusedfortheEW-ShoppframeworkandsetupofdataistheDataGraftplatform(tier2).DataGraftsecurityisimplementedonseverallayersasfollows:
1) User login – Account information is protected by a password, which is encrypted andDataGraftdoesnotstorethenon-encryptedversion.Furthermore,currentdeploymentsofDataGraft use SSL certificates enabled through the CloudFront CDN on AWS. OtherconfigurationsofSSLarealsopossibleifnecessary;
2) OAuth2 – DataGraft uses a standard implementation of RFC 6749 – token-basedauthorisationlayerforcontrolofclientaccesstoresources;
3) APIkeysfordatabase–ThepublicAPIoftheback-enddatabaseofDataGraft(OntotextS4)isaccessiblethroughanAPIkey,whichcanbecreatedandmanagedbyregisteredusersoftheplatform;and
4) Encrypted cookies – Front-end cookies containing session information are exchangedbetweenthewebUIandtheback-end.ThiscookiestoresasessionidentifierandencryptedsessiondatawhenusersareloggedintotheDataGraftPortal.
Securitywillbeconsideredadditionallyforthepurposesofdataexchangebetweenpartners(tier1)and sharing before the final data integration/publication. Theparticular securitymeasureswill betakenonacasebycasebasisbasedonthemediumfordataexchangeandthepreciseneedsofeachdataprovide.Theywillincludethefollowing:
1) Settingupsecuritypoliciesoncloudserviceproviders2) SettingupsecureFTPserverforfiletransferofanyfilesovertheInternet3) SettingupsecretSSHkeysforaccessingservers/clustersofserverswithrunningdatabases
thathostanyshareddataset
7.5 Permission
PermissionpolicieswillbeprovidedtomakeEW-Shoppcompliantwiththeprivacy-preservingdatamanagement. The platformwill provide authenticationmechanisms that ensure data security, asstated inSection7.4 (supportedbythechosendataexchangemedium intier1andtheDataGraftplatform), inorder to restrictaccess todata files to the researchpersonnel involved inEW-Shoppdevelopment
7.6 Access,Re-useandLicensing
The individual input dataset sharing canbe found inChapter 6 under "DatasetACCESS", togetherwiththe individual licenseforeachofthem.TothisendaccesswillbeprovidedtothewholeEW-ShoppConsortiumandexclusivelyfortheprojectobjectives.
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 114
DatasetsproducedasaresultoftheprojectworkwillbesharedwithintheConsortiumandwillonlybeallowedforexternalsharingwithaconsensualConsortiumapprovaloftherelevantstakeholders,byacceptingthetermsandconditionsofuse,asappropriate.Thelicensefortheaccess,sharingandre-useofEW-Shoppmaterial andoutputdatasetswillbedefinedby theConsortiumona casebycasebasis.
Theresearchdatawillbepresentinscientificpublicationsthattheconsortiumwillwriteandpublishduringthefundingperiod.MaterialsgeneratedundertheProjectwillbedisseminatedinaccordancewithConsortiumAgreement.
AnnexA–DMPSurvey
HereistheDMPsurveycontainingthequestionsaskedtodataproviders.
Topic Question
DATASETIDENTIFICATION a. Nameofthedataset.Specifyaself-explainingnameofthedataset.
b. Datasetowner/publisher/providername.Specifythenameofthebeneficiaryprovidingthedataset(orbeinginchargeofbringingitintotheproject).
c. ContactpersonSpecifynameandcontactsofthepersontobecontactedforfurtherdetailsaboutthedataset
d. Statethepurposeofthedatacollection/generationSpecifywhatisthedataaboutandwhatisthedatacurrentlyusedfor
e. ExplaintherelationtotheobjectivesoftheprojectWhichistherelatedbusinesscase?
DATASETORIGIN a. Specifythetypesandformatsofdatagenerated/collectedProvideahigh-leveldescriptionoftheconstitutingthedataset
b. Specifyifexistingdataisbeingre-used(ifany)Specifyifthedatasetisre-usingexistingdata,andfromwhere
c. SpecifytheoriginofthedataSpecify how the data in the dataset is being collected/generated.Selectoneof the followingcategories:Webscraping,Webmeteringsoftware, derived from other datasets, Instrumentation or sensors,Administrativearchives,Crowdsourcing,Surveyorcensus,Other(addexplanation),N/A
d. Statetheexpectedsizeofthedataset(ifknown)Provide a ROM estimate in case of static dataset in terms ofMB/GB/TB, or provide a ROM estimate of a dynamic dataset byselecting themostappropriate frequency in termsofMB/GB/TBperhour/day/week/months/other.
DATASETFORMAT a. Outlinethedatasetstructure(metadataprovision)Describethestructureandtypeofthedata.Forexample,describethe
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 116
headercolumns,describetheJSONschema,RESTresponsefields,etc.
b. OutlinethedatasetformatOutlinethedatasetformat,specifyingif it isusing,forexample,CSV,Excelspreadsheet,XML,JSON,GeoJSON,Shapefile,HTTPstream,etc.
c. TimecoverageIfthedatasethasatimedimension,whatperioddoesitcover?
d. SpatialcoverageIfthedatasetrelatestoaspatialregion,whatisitscoverage?
e. LanguageLanguagesofmetadata,attributes,codelists,descriptions
f. Outline the identifiability of data and refer to standardidentificationmechanism.
Doyoumakeuseofpersistentandunique identifiers suchasDigitalObjectIdentifiers?g. OutlinenamingconventionsusedIfthedatasetisnotstatic,describehowthedatasetcanbeidentifiedifupdatedorafteraversioningtaskhasbeenperformed
h. OutlinetheapproachforversioningHow often is the data updated (No planned updating, Annually,Quarterly,Monthly,Weekly,Daily,Hourly, Every fewminutes, Everyfewseconds,Real-time)?Howismanagedtheversioning(e.g.ifdaily,everydayanewdatasetisgeneratedwiththenewlycreateddataoreverydayanewdatasetoverridestheoldonecontainingallthedatageneratedfromthebeginningofthecollection,…)?
i. Specify standards formetadata creation (if any). If there are nostandards in your discipline describe what metadata will becreatedandhow
Ifyouannotatesomemetadatatoyourdataset,pleasespecifyifyouare using any existing standard (e.g. if your dataset contain a textentryandyouannotatethetextwithmetadatabelongingtoaspecifictaxonomy,pleasespecifythetaxonomyyouarereferringto).
MAKINGDATAACCESSIBLE a. SpecifydatasetlicenseIfthedatasetisreleasedasopendata,specifythelicenseused:CC0,CC-BY, CC-BY-SA, CC-BY-ND, CC-BY-NC, CC-BY-NC-SA, CC-BY-NC-ND,PDDL, ODC-by, ODbL, Other or proprietary (please provide link ifpossible). Otherwise, specify who have access to the dataset (forexample, all partners in the consortium, some partners for thepurposeoftooldevelopment,onlyasamplewillbedisclosed,etc.).
EW_Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1 117
b. SpecifyhowthedatawillbemadeavailableForexample,webpageinthebrowser,webservice(REST/SOAPAPIs),query endpoint, file download, DB dump, directly shared by theresponsibleorganisation,etc.
c. Specifywhatmethodsorsoftwaretoolsareneededtoaccessthedata?Isdocumentationaboutthesoftwareneededtoaccessthedataincluded?Isitpossibletoincludetherelevantsoftware(e.g.inopensourcecode)?
d. Specifywherethedataandassociatedmetadata,documentationandcodearedeposited(providedatasetsourceURL,ifapplicable)
e. Specify how access will be provided in case there are anyrestrictions
f. Theme/tagsCategorize the dataset and provide some relevant keywords/tags.Selectoneof the followingcategories: "product categories", "price","consumerelectronics","other(addexplanation)"
g. Archivingandpreservation(includingstorageandbackup)Descriptionof theprocedures thatwillbeput inplace for long-termpreservationof thedata. Indicationof how long thedata should bepreserved,whatisitsapproximatedendvolume,whattheassociatedcostsareandhowtheseareplannedtobecovered.
MAKINGDATAINTEROPERABLE
a. Assess the interoperability of your data. Specify what data andmetadata vocabularies, standards or methodologies you willfollowtofacilitateinteroperability.
b. Specifywhetheryouwillbeusingstandardvocabularyforalldatatypes present in your data set, to allow inter-disciplinaryinteroperability? If not, will you provide mapping to morecommonlyusedontologies?
DATASECURITY a. Address data recovery aswell as secure storage and transfer ofsensitivedata
b. Refertoothernational/funder/sectorial/departmentalproceduresfordatamanagementand/orprivacythatyouareusing(ifany)
c. DataprivacySpecifyifthedatasetincludespersonallyidentifiableinformation(PII).Are data anonymised? If so, what is the used technique? Does thedataownercollectprivacypermissiontoelaboratedata(andwhatarethelimitations)?DoesyourdataincludeIPaddresses?