confidential computing - analysing data without seeing data
TRANSCRIPT
www.csiro.au
DataAnaly1csWITHOUTSeeingtheDataMaxO>…withinputfromtheen1reN1Teammax.o>@data61.csiro.au
FutureValueofData
Data Analytics Without Seeing the Data 2|
time
value
release
Data decays with time!
FutureValueofData
Data Analytics Without Seeing the Data 3|
time
value
release
Joined with another data set – more value!!
FutureValueofData
Data Analytics Without Seeing the Data 4|
time
value
release New analytics techniques – more value!!
FutureValueofData
Data Analytics Without Seeing the Data 5|
time
value
release Data decay
+ Joining new data
+ New analytics techniques
Uncertain future value Unknown future risk
Challenge
Computa.on
Result
Confidential
Learnthis!
LearnNOTHING
DataAnaly.csWithoutSeeingtheData6|
TheProblem
Howcanwelearnvaluableinsightsfromsensi1vedatafrommul1pleorganisa.ons?
Insights
Sensitive data
Sensitive data
Joint Analysis
Confidential Confidential
DataAnaly.csWithoutSeeingtheData7|
ThreeBasicBuildingBlocks
• Privatecomputa.on• Arithme.conencryptednumbers
• Distributed,confiden.alanaly.cs• Distributedalgorithms,computa.on&protocols
• PrivateRecordLinkage• Privacypreservingrecordlevelmatching
DataAnaly.csWithoutSeeingtheData8|
Solu1on(1):Privatecomputa1on
3 E7117593598749643033862322306020184392520845976281563526294981559259516861516633702469933935260534155369128712003211669147527394965883186987430405887069486581926553537132809459595364742532851158563479115837779718562708357817416015729957944589069202390269842442766563604072938327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597109…..........
655353713280945959536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805720754176976456091364980410538327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089…....
2 E
+ “+”
9536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805118867293477923338583556495053804260814673481851259710956280997821095895622448011352839812888469270046257630846965506077009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089270046257630846…....
D5
= =
DataAnaly.csWithoutSeeingtheData9|
Solu1on(1):Privatecomputa1on
3 E7117593598749643033862322306020184392520845976281563526294981559259516861516633702469933935260534155369128712003211669147527394965883186987430405887069486581926553537132809459595364742532851158563479115837779718562708357817416015729957944589069202390269842442766563604072938327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597109…..........
655353713280945959536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805720754176976456091364980410538327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089…....
2 E
+ “+”
9536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805118867293477923338583556495053804260814673481851259710956280997821095895622448011352839812888469270046257630846965506077009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089270046257630846…....
D5
= =
10| DataAnaly.csWithoutSeeingtheData
Solu1on(2):Distributedanaly1cs
Compute
DataDept2
Compute
DataN1 Secure computeConfidentiality boundary
Dataalwaysremainsconfiden1altothesourceins.tu.on
Dept1
Compute N1 Coordinator
Messagescontainingencrypteddata
11| DataAnaly.csWithoutSeeingtheData
Solu1on(3):PrivateRecordLinkage
DatasetA DatasetB
Tori Mckone 7/06/1921 F
Tori Mackon 6/07/1921 F
Victoria Mckon 7/06/1921 F ?
?
12| DataAnaly.csWithoutSeeingtheData
UseCases
Scoring
Model
OwnData
OtherData
Quality
??
15| DataAnaly.csWithoutSeeingtheData
SuspiciousAc1vi1esNeedtoreport?
Model Builder
16| DataAnaly.csWithoutSeeingtheData
IndustryusingGovData
Model Builder
OwnData
GovData
17| DataAnaly.csWithoutSeeingtheData
Benchmarking
OwnData
Model Builder
18| DataAnaly.csWithoutSeeingtheData
DeviceAnaly1cs
Data Analytics Without Seeing the Data
Modelofnormalbehaviour
OK OK NG OK
PrivateModeling
learn
deploy
OK NG OK
19|
PrivateComputa1on
Homomorphicencryp1on
Partial Homomorphic
Encryption
Somewhat Homomorphic
Encryption
Fully Homomorphic
Encryption
Allows either addition or multiplication of encrypted numbers
Allows evaluation of low order polynomials
Allows evaluation of arbitrary functions
Mor
e ge
nera
l
Fast
er
DataAnaly.csWithoutSeeingtheData21|
PaillierEncryp1on
c = gmrnmodn2Encryption of m:
D E m1( ).E m2( )modn2( ) =m1 +m2 modn
D E m1( )m2 modn2( ) =m1m2 modn
Addition of encrypted numbers:
Multiplication of encrypted number by a scalar:
DataAnaly.csWithoutSeeingtheData22|
PaillierEncryp1on
c = gmrnmodn2Encryption of m:
Addition of encrypted numbers:
Multiplication of encrypted number by a scalar:
gm1 × gm2 = gm1+m2
gm1( )m2= gm1m2
DataAnaly.csWithoutSeeingtheData23|
PaillierImplementa1ons
• Python–opensource• www.github.com/nicta/python-paillier
• Java–opensource• www.github.com/nicta/javallier
• Javascript–s.llundercloseddevelopment
24| DataAnaly.csWithoutSeeingtheData
Distributed,Confiden1alAnaly1cs
DistributedCompu1ngwithaTwist
Compute
DataOrg2
Compute
DataN1 Secure computeConfidentiality boundary
Dataalwaysremainsconfiden1altothesourceorganisa.on
Org1
Compute N1 Coordinator
MessagescontainingONLYencrypteddata
DataAnaly.csWithoutSeeingtheData26|
GraphComputa1onEngine
Domains
CE
CE
CE
DF DF
CE
DF
CE
Coordinator
Worker
Workers
Properties
M
M
M
M M
Messages
M JSON Message
CE AKKA actors
DF Data frames
27| DataAnaly.csWithoutSeeingtheData
N1Analy1csPla[orm
Privacy Technologies
Partial homomorphic encryption
Private Record Linkage
Irreversible aggregation
Distributed Graph Computation Engine
Analytics Statistics Regression Clustering
Data Auth
Machine Learning Learn Evaluate Deploy
Network
DataAnaly.csWithoutSeeingtheData28|
Logis1cRegression
p x;θ( ) = 11+ e−θ .x
L θ( ) = yi log p xi;θ( )+ 1− yi( )i=0
n
∑ log 1− p xi;θ( )( )
Logis.cfunc.on
Loglikelihood
Minimisefor:
Evaluate:
θ
Requires“securelog”and“secureinverse”protocolusingPaillierencryp.on
29| DataAnaly.csWithoutSeeingtheData
Builds on Han et al. 2010 “Privacy Preserving Gradient Descent Methods”
ExamplePaillierLogis1cRegression
Org B
CE CE
Coordinator
Worker
Secure Log
Logistic Learner
Secure Inverse
M JSON Message
CE AKKA actors
DF Data frames
Gradient Descent
Private key holder
Features & labels Features
Org A
N1Analytics
30| DataAnaly.csWithoutSeeingtheData
Performance
• Learning• Learntmodelshavethesame
accuracyasunencryptedcalcula.ons
• “Privatelearning”is(1000x)slowerduetoencryptedcomputa.ons.Learning.mesareseveralhours.
• Deployment• Ascorecanbegeneratedinreal
.me(<50ms)• Customerdatathatcontributesto
thescoreremainsprivate.
��� ���� ������������� (����)
���
����
�������
�������� ���� (�)
�������� �������� ����������������� ���� ��� ����
31| DataAnaly.csWithoutSeeingtheData
Scaling
Coordinator
Data Provider 1
Data Provider 2
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
��������
●●
● ●●
■
■■ ■ ■
◆◆ ◆
◆
0 100 200 300 400Cores
5
10
50
100
500Minutes
Learning time scaling
● 10,000x10 features
■ 100,000x10 features
◆ 1,000,000x10 features
32| DataAnaly.csWithoutSeeingtheData
Confiden1alRecordLinkage
RecordLinkageChallenge
DatasetA DatasetB
Tori Mckone 7/06/1921 F
Tori Mackon 6/07/1921 F
Victoria Mckon 7/06/1921 F ?
?
41| DataAnaly.csWithoutSeeingtheData
Solu1on(3):PrivateRecordLinkage
JaneDoe
PaulDoe
JimClark
KateClark
ShanBo
RegPal
JanetDoe
BobDoe
JimClark
KatClark
ShanBo
JoeSmith
a8bf342
f72630b
14oe54
a72bef4
7830530
4bf6021
a8bf242
b3894f3
14oe54
672bef4
7830530
80ac364FuzzyMatching
Onewayhashfunc.ons Onewayhashfunc.ons
42| DataAnaly.csWithoutSeeingtheData
PrivateRecordLinkage
FuzzyMatcher
SharedSecretSaltHasher
PersonallyIden.fiableInforma.on
AnonymousBloomfilter
Hasher
PersonallyIden.fiableInforma.on
AnonymousBloomfilter
LinkageTableN1
CompanyA CompanyB
PIIcannotberecoveredfromthehashes43| DataAnaly.csWithoutSeeingtheData
PrivateRecordLinkage
44|
44
Organisa.onB
FuzzyMatcher
Organisa.onA
N1Analy.cs
A's$PII$dataName DOB Gender
John/Smith 12/01/82 MMark/Gorgon 1/12/90 MHanna/Smith 4/02/78 F
… … …… … …
Juliet/Baker 2/11/72 F
B's$PII$dataName DOB Gender
Mark.Gorgon 1/12/90 MJuliet.Baker 2/11/72 F
Andrew.Roberts 4/02/93 M… … …… … …
Hanna.Smith 4/02/78 F
A's$Cryptographic$HashesRow Key
1 10110110...001010102 01110110...110101013 10011001...10100110… …… …
100000 01101011...00101101
B's$Cryptographic$HashesRow Key
1 01110110…110101012 01101011...001011013 01111000…00110011… …… …
100000 10011101...10100111
SharedSecretSaltHasher Hasher
Linkage(TableRow$A Row$B
1 X2 13 100000… …… …
100000 X
Similar in approach to MERLIN - Ranbaduge, Vatsalan, Christen (2015) DataAnaly.csWithoutSeeingtheData
Probabilis1cRecordLinkage
Commoncategoricalfeatures(e.gpostcode,agerange,gender)
Recordlinkagecanbeaprivacyissue
45| DataAnaly.csWithoutSeeingtheData
Classifica1onwithoutiden1tylinking
46|
FeaturesLabe
lsRadosFeatures
Shared
feature
Labe
ls*
LabelPropor.ons
Learning from Label Proportions
Patrini, Nock, Caetano, & Rivera, NIPS (2014), (Almost) No label no cry
DataAnaly.csWithoutSeeingtheData
Classifica1onwithoutiden1tylinking
47|
FeaturesLabe
lsRadosFeatures
Shared
feature
Labe
ls*
EncryptedLabelPropor.ons
Learning from Encrypted Label Proportions
DataAnaly.csWithoutSeeingtheData
CurrentStatus
CurrentCapabili1esofN1pla[orm
• Standarddataanaly.cstechniquesonconfiden.aldata:• Correla.onanalysis• Classifica.on/predic.on• Regression• Clustering/outlierdetec.on
• Automatedprivaterecordlinkage
• Finegrainedauthorisa.onandaccesscontrol
Dept1
Org2
Comp3Privaterecord
linkage
Sta.s.cs Classifiers AnomalyDetec.on
Privateanaly.cs
Federatedmodel–NocentraldatabaseDataiskeptlocaltothesource
49| DataAnaly.csWithoutSeeingtheData
Betaprogram
• Notopensourced(yet!)• Lookingforpartnerswhowanttouseoursystemintheirapplica1ons
• S.llsomewarts,butworkingincommercialsesng
50| DataAnaly.csWithoutSeeingtheData
Acknowledgements
51|
Engineering Mr. Brian Thorne Dr. Mentari Djatmiko Dr. Guillaume Smith Dr. Wilko Hanecka Dr. Hamish Ivey-Law
Research Dr. Richard Nock Mr. Giorgio Patrini Dr. Roksana Borelli Dr. Arik Friedman Prof. Hugh Durrant-Whyte
Business Mr. Warren Bradey Ms. Shelley Copsey
Lead: Dr. Stephen Hardy
DataAnaly.csWithoutSeeingtheData
www.csiro.au
DataAnaly1csWithoutSeeingtheDataMaxO>…withinputfromtheen1reN1Teammax.o>@data61.csiro.au