SHARINGSENSITIVEDATAWITHCONFIDENCE:THEDATATAGSSYSTEM
MercèCrosas,Ph.D.ChiefDataScienceandTechnologyOfficerIQSSHarvardUniversity
MichaelBar-SinaiPhDcandidateinComputerScienceattheBen-GurionUniversityoftheNegev,IsraelFellowattheInstituteforQuantitativeSocialScienceatHarvardUniversity.
LatanyaSweeneyProfessorofGovernmentandTechnologyinResidenceDirectorofDataPrivacyLabHarvardUniversity
Datasharing: goodforyouandgoodfortheworld
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Federalfundingagencies
Makepublicassets
accessible
Datasharing: goodforyouandgoodfortheworld
ResearchersGetcreditfortheirdata
PublishersandJournals
Verifypublishedwork
Federalfundingagencies
Makepublicassets
accessible
ScienceValidate,reuseandextend
previouswork
dataverse.org
Open-sourcesoftwaredevelopedatHarvard’sIQSSsince2006Usedtoshare,publish,citeandarchiveresearchdata
Installedin12sitesworldwideServing100sofuniversitiesandorganizations
HarvardDataverse:dataverse.harvard.eduStartedasacommunityrepositoryforSocialScienceNowopentoallresearchfieldsandallresearchers
Morethan1300dataversesMorethan59,000datasets
Morethan1,500,000downloads
DataRepositoriesvs RepositorySoftware
DataRepositoriesvs RepositorySoftware
DataRepositoriesvs RepositorySoftware
DataRepositoriesvs RepositorySoftware
But,existingcommunityrepositoriesdonotsupportsensitivedata
“UserUploadsmustbevoidofallidentifiableinformation,suchthatre-identificationofanysubjectsfromtheamalgamationoftheinformationavailablefromallofthematerials(acrossdatasetsanddataverses)uploadedunderanyoneauthorand/orusershouldnotbepossible.”
“SubmitterrepresentsandwarrantsthattheContentdoesnotcontainanyinformation(i)whichidentifies,orwhichcanbeusedinconjunctionwithotherpubliclyavailableinformationtopersonallyidentify,anyindividual;”
“IfyouaresubmittinghumansequencestoGenBank,donotincludeanydatathatcouldrevealthepersonalidentityofthesource.Itisourassumptionthatyouhavereceivedanynecessaryinformedconsentauthorizationsthatyourorganizationsrequirepriortosubmittingyoursequences.”
GenBank
HOWCANWEMAXIMIZESHARINGSENSITIVEDATAWHILEBEINGMINDFULOFPRIVACY?
SweeneyL,CrosasM,Bar-SinaiM.SharingSensitiveDatawithConfidence:TheDataTagsSystem.TechnologyScience.2015101601.October16,2015.http://techscience.org/a/2015101601
Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling
Adatatagisasetofsecurityfeaturesandaccessrequirementsforfilehandling
Adatatagsrepositoryisonethatstoresandsharesdatafilesinaccordancewithastandardizedandorderedlevelsofsecurityandaccessrequirements.
ADataTagsRepositorymustsatisfythefollowingconditions:
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
c. andagreetoanytermsofuserequiredtoacquirethefile.
ADataTagsRepositorymustsatisfythefollowingconditions:
1. Supportsmorethanonedatatag
2. Eachfileintherepositorymusthaveoneandonlyonedatatag
a. additionalrequirementscannotweakenthefilesecurity
b. andcannotrequiredthesameormoresecuritythanamore
restrictivedatatag3. Arecipientofafilefromtherepositorymust:
a. satisfyfile’saccessrequirements,
b. producesufficientcredentialsasrequested,
c. andagreetoanytermsofuserequiredtoacquirethefile.4. Providestechnologicalguaranteesforrequirements1,2and3.
DatatagsLevelsTagType Description SecurityFeatures AccessRequirements
Blue Public ClearstorageCleartransmission Open
Green Controlledpublic
ClearstorageCleartransmission
Email,OAuthverifiedregistration
Yellow Accountable ClearstorageEncryptedtransmit
Password,Registered,Approval,ClickDUA
Orange Moreaccountable
EncryptedstorageEncryptedtransmit
Password,Registered,Approval,SignedDUA
Red Fullyaccountable
EncryptedstorageEncryptedtransmit
Two-factorauthentication,Approval,SignedDUA
Crimson Maximallyrestricted
MultiEncryptstoreEncryptedtransmit
Two-factorauthentication,Approval,SignedDUA
DATATAGSWITHHARVARDDATAVERSE
Level1:Nosensitivedata;opendata
Level1:De-identifieddata
Level2:ConfidentialinformationbyUniversitystandards;nomaterialharm
Level3:Confidentialinformationthatcouldcausematerialharm(non-level4FERPA)
Level4:Highriskconfidentialinformation(SSN)
Level5Informationthatwouldcausesevereharm
DataTagsvsHarvardSecurityLevels
Dataverses,Datasets,DataFilesandDataTags
ADatatagisassignedtoeachDataFile(nottotheDataset)
DataTagsWorkflowwithDataverse
http://datatags.orghttp://privacytools.seas.harvard.edu
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
http://datatags.orghttp://privacytools.seas.harvard.edu
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
DirectAccess
http://datatags.orghttp://privacytools.seas.harvard.edu
AuthorizedSignedDUA
AutomaticInterview
ReviewBoardApproval
DataTagsWorkflowwithDataverse
DataFileIngestion
SensitiveDataset
DirectAccess
PrivacyPreservingAccess
http://datatags.orghttp://privacytools.seas.harvard.edu
AuthorizedSignedDUA
AutomaticInterview
ReviewBoardApproval
ACuratorModelforPrivacy-PreservingAnalysis
Acknowledgement:Honaker,J.andNissim,K.,DataPrivacyToolsProject
DifferentiallyPrivatestatistics(summaries,causalinference,regression,interactivequeries)
CredentialsandRetrievalinDataverse
DataFilenotrestrictedGuestbook–Emailtoaccess
DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;ClickDUA
DataFilerestricted;Dataverse/InCommonaccount;Requestaccess;SignDUA
DataFilerestricted;InCommonaccount;Requestaccess;Two-FactorauthenticationSignDUA
OTHERTYPEOFDATATAGSREPOSITORIES
Betty:SoleResearcher
• Receivedconsentfromparticipants• Repositoryforsharinghighly
sensitivedata(notnecessarilyHarvardDataverse)
Betty:GlobalResearchRepositoryIngestion and
Decision-making Knowledge
IRB determination or an interview system.
Codification and Infrastructure
Blue, Green, Yellow, Orange, Red, Crimson.
Credentials and Retrieval
Different files may additionally require specific terms of use based on legal or regulatory requirements or adopted best practices.
(SameusecaseasDataverse)
Adam:LargeMedicalResearchGroup
• Repositoryforsharinglocaldata• Repositoryforpublisheddata• Repositoryforsharingwith
collaborators
Adam:LargeMedicalResearchGroup
Diane:MultinationalCorporation
• Cloudcontainsdatafromallovertheworld,collectedunderavarietyofterms,subjecttodifferentlaws
• Repositorythatenforcesrequirementsonemployeeaccess
Diane:MultinationalCorporation
Charles:InstitutionalReviewBoard
• Documentcommitteedecisions• Recommendhandlingbasedon
priordecisions
Charles:InstitutionalReviewBoard
Howtechnologyimpactshumans.
DATA
Howtechnologyimpactshumans.
DATA
Howtechnologyimpactshumans.
DATA
DirectDepositDirectTagging
KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101
techscience.org
KhannaA.Facebook'sPrivacyIncidentResponse:astudyofgeolocationsharingonFacebookMessenger.TechnologyScience.2015081101.August11,2015.http://techscience.org/a/2015081101
techscience.org
Published2015-09-29
SweeneyL,YooJ.De-anonymizingSouthKoreanResidentRegistrationNumbersSharedinPrescriptionData.TechnologyScience.2015092901.September29,2015.http://techscience.org/a/2015092901
techscience.org
DATATAGGINGTOOLS
ADatataggingtoolneeds:
• FormaldescriptionofaDatatag– Capturethedatahandlingpolicyofthetag– Capturethe“stricter-than”ordering
• Interviewcreationtool– Supportuser-friendlyinterviews– Decideonthedatatagbasedontheanswersonly
FormalDescriptionofaDatatag
• Modeldatahandlingpoliciesasasetoforthogonalaspects– Storageencryption,accessrequirements…
• Describeimplementationoptionsforeachaspect;orderimplementationsfromlenienttostrict– Clear<Encrypted<MultiEncrypt
DataHandlingPolicySpace
DataHandlingPolicySpace
Tags:TagsSpacefile(.ts)
• Describeatagspace
• Conveniencefeatures:hierarchy,“slots”ofdifferenttypes,top-downdesignsupport,comments…
ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU
ComprehensionAid:Visualization
ComprehensionAid:Visualization
ComprehensionAid:Visualization
FindingtheRightTag–DecisionGraph
• Directed,AcyclicGraph
• NodeTypes:– Ask– Set– Convenience:Call,End,Reject,Todo
FindingtheRightTag–DecisionGraph
ScreenshotfromactualAtompackage:GalMaman,MatanToledano,BGU
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
InterviewVisualization
Interviewcredit:TheDataPrivacyLab@Harvard(LatanyaSweeney,SeanHooley),BerkmanCtr.forInternetandSociety(AlexandraWood,DavidO'Brien,clinicalstudents),IQSS(MercèCrosas,MichaelBar-Sinai).PartofthePrivacytoolsforsharingresearchdataproject
InterviewontheWeb
InterviewontheWeb
InterviewontheWeb
InterviewontheWeb
InterviewontheWeb
Interviewavailableatdatatags.org
DecisionGraphPoints
DecisionGraphPoints
• Familiar“interviewwithaspecialist”metaphor
DecisionGraphPoints
• Familiar“interviewwithaspecialist”metaphor
• Implicitlydescribelogicinference
DecisionGraphPoints
• Familiar“interviewwithaspecialist”metaphor
• Implicitlydescribelogicinference
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
DecisionGraphPoints
• Analysis:DetectionofIndependentparts
• Queries,suchas“whatseriesofanswerswillcreateadatatagsthatallowsclearstorage?”
DecisionGraphPoints
• Optimizations
ExamplecreatedbyEyalBen-Simon,BGU
DecisionGraphPoints
• Optimizations
ExamplecreatedbyEyalBen-Simon,BGU
DecisionGraphPoints
• Optimizations
ExamplecreatedbyEyalBen-Simon,BGU
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
StateoftheTagsTool• Open-sourceprojectat
GitHub• Languagegettingmoretools
andfeature– ProjectwithBGUstudents
• LanguageToolsinprogress– Inspectors,Visualizers,CLIdevelopmenttool
• Tutorialsandreferencedatatagginglibrary.readthedocs.org
• Collaborationvia,e.g.GitHub
FutureoftheTagsTool
• Updatewebinterviewapplication– Includeuploadandinspectionfeatures
• On-linecollaborationenvironment– A-laGoogledocs?
THANKSMercèCrosas,MichaelBar-Sinai,LatanyaSweeney