reproducible research in biometrics - moving to the beat...
TRANSCRIPT
1 / 46
Reproducible research in BiometricsMoving to the BEAT
Sebastien MarcelIdiap research institute
Switzerlandwww.idiap.ch/˜marcel
International Conference on Biometrics (ICB)Phuket – May 21, 2015
2 / 46
Then a miracle occurs !
How many times?
3 / 46
Then a miracle occurs !
Crossed a publication and openly decided to ignore it because itwould be too hard to apply it on your research?
3 / 46
Then a miracle occurs !
Worked day and night to incorporate some results on your ownwork but:
• There were untold parameters that needed adjustment andyou couldn’t get hold of them?
• Realized the proposed algorithm worked only on the specificdata shown at the original paper?
• Realized that something did not quite add up in the end?
3 / 46
Don’t touch anything !
Had to take over the work from a student or another colleaguethat left and had to start from scratch - months into programmingto make things work again?
4 / 46
Don’t touch anything !
Would have liked to replay to someone about your work, butyou couldn’t really remember all details when you first made itwork? Or you could not make it work at all?
4 / 46
Scientific publicationsSUPPLEMENTAL MATERIAL 1
A Scalable Formulation of Probabilistic LinearDiscriminant Analysis: Applied to Face Recognition
Laurent El Shafey, Chris McCool, Roy Wallace, and Sebastien Marcel
APPENDIX AMATHEMATICAL DERIVATIONS
The goal of the following section is to provide more detailedproofs of the formulae given in the article for both trainingand computing the likelihood.
The following proofs make use of a formulation of theinverse of a block matrix that uses the Schur complement.The corresponding identity can be found in [1] (Equations1.11 and 1.10),
L MN O
��1
=
R, �RMO�1
�O�1NR, O�1 + O�1NRMO�1
�, (51)
where we have substituted R =�L � MO�1N
��1.
Another related expression is the Woodbury matrix identity(Equation C.7 of [2]), which states that,
(L + MON)�1
=
L�1 � L�1M�O�1 + NL�1M
��1NL�1.
(52)
A. Scalable training
The bottleneck of the training procedure is the expectationstep (E-Step) of the Expectation-Maximization algorithm. ThisE-Step requires the computation of the first and second ordermoments of the latent variables.
1) Estimating the first order moment of the Latent Vari-ables: The most computationally expensive part when es-timating the latent variables is the inversion of the matrix˚P (Equation (27)). This matrix is block diagonal, the twoblocks being P0 (Equation (28)) and (a repetition of) P1
(Equation (29)),
˚P =
266664
P0 0 · · · 0
0 P1. . . 0
0. . . . . . 0
0 · · · 0 P1
377775
. (53)
The inverse of P1 is equal to the matrix G, defined by(30). This matrix is of constant size (DG ⇥ DG), irrespective
L. El Shafey is with Idiap Research Institute and Ecole PolytechniqueFederale de Lausanne, Switzerland e-mail: [email protected]
C. McCool, R. Wallace and S. Marcel are withIdiap Research Institute, Martigny, Switzerland e-mail:{christopher.mccool,roy.wallace,sebastien.marcel}@idiap.ch
The research leading to these results has received funding from theEuropean Communitys Seventh Framework Programme (FP7) under grantagreements 238803 (BBfor2) and 257289 (TABULA RASA).
of the number of training samples for the class. In addition,the inversion of P0 can be further optimised using the blockmatrix inversion identity introduced at the beginning of thissection, leading to
P�10 =
"FJi
pJiHT
pJiH
⇣IDG
� JiHF T⌃�1G⌘
G
#, (54)
where FJiis defined by (33) and H by (37).
Then, the computation of ˚P�1˚AT ⌃�1
gives a block diag-onal matrix, the first block being
" pJiFJi
F T SGGT⌃�1
⇣IDx
� JiFFJiF T S
⌘#
,
and the other ones being equal to GGT⌃�1.As explained in section III.B.a of the article, hi corresponds
to the upper sub-vector of ˚yi and is not affected by the changeof variable, as depicted in (21). Therefore, the first ordermoment of hi is directly obtained by multiplying the firstblock-rows of the matrix ˚P�1˚AT ⌃
�1with ˚xi, which gives
(31).Considering only the ˚wi (lower) sub-vector of ˚yi, the
corresponding (lower) part ˚B of the matrix ˚P�1˚AT ⌃�1
canbe decomposed into a sum of two matrices, the first onebeing sparse with a single non-zero block (upper left) equalto B0 = �JiGGT⌃�1FFJiF
T S, and the second one beingdiagonal by blocks with identical blocks B1 = GGT⌃�1,
˚B =
24
B0 0 00 0 00 0 0
35 +
264
B1 0 0
0. . . 0
0 0 B1
375 . (55)
Furthermore, the first order moment of the variables wi isgiven by
E [wi|xi,⇥] =⇣U
T ⌦ IDG
⌘24
B0 0 00 0 00 0 0
35˚xi (56)
+⇣U
T ⌦ IDG
⌘264
B1 0 0
0. . . 0
0 0 B1
375⇣U ⌦ IDx
⌘xi.
The previous decomposition greatly simplifies the compu-tation, and leads to the following expression for each wi,j ,
E [wi,j |xi,⇥] = GGT⌃�1xi,j
� GGT⌃�1FFJiF T S
X
j
xi,j (57)
JOURNAL TO BE DEFINED, VOL. X, NO. T, JANUARY 2013 12
-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0Verification Scores
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Nor
mal
ized
Cou
nt
0
20
40
60
80
100
SFA
R(%
)
91.5%
(a) GMM
1.0 1.5 2.0 2.5 3.0 3.5Verification Scores
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Nor
mal
ized
Cou
nt
0
20
40
60
80
100
SFA
R(%
)
88.5%
(b) LGBPHS
0.4 0.5 0.6 0.7 0.8 0.9 1.0Verification Scores
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
Nor
mal
ized
Cou
nt
0
20
40
60
80
100
SFA
R(%
)
95.0%
(c) GGG
-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0Verification Scores
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Nor
mal
ized
Cou
nt
0
20
40
60
80
100
SFA
R(%
)
92.6%
(d) ISV
Fig. 10: Score distributions of baseline face verification systems. The full green line shows how SFAR changes with movingthe threshold.
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
5
10
15
20
25
30
HTE
R�
(%)
(a) HTER! , GMM
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
5
10
15
20
25
HTE
R�
(%)
(b) HTER! , LGBPHS
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
2
4
6
8
10
12
14
16
18
HTE
R�
(%)
(c) HTER! , GGG
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
5
10
15
20
25
HTE
R�
(%)
(d) HTER! , ISV
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
20
40
60
80
100
SFA
R(%
)
(e) SFAR, GMM
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
10
20
30
40
50
60
70
80
90
SFA
R(%
)
(f) SFAR, LGBPHS
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
20
40
60
80
100
SFA
R(%
)
(g) SFAR, GGG
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0
20
40
60
80
100
SFA
R(%
)
(h) SFAR, ISV
Fig. 12: EPSC for comparison of fusion techniques of baselines with LBP anti-spoofing algorithm
D. Performance of fused systems
In our last experiment, we compare the four face verificationsystems when fused with ALL counter-measures using PLRfusion scheme. Firstly, we illustrate how fusion changes thescore distribution for each of them separately in Figure 14.Then, in Figure 15 we compare which of the fused systemsperforms the best.
While Figure 10 shows that the spoofing attacks of Replay-Attack are in the optimal category when fed to the baselineface verification systems, Figure 14 illustrates that their effec-tiveness has vastly changed after fusion. The score distributionof the spoofing attacks is now mostly overlapping with thescore distribution of the zero-effort impostors, allowing forbetter discriminability between the positive class and the twonegative classes. The results are reflecting this observation:even when the threshold is obtained using the licit scenario,SFAR has dropped to less then 6%.
The comparison between the EPSC curves given in Fig-ure 11(a) and Figure 15(a), confirms the above observations:
0.0 0.2 0.4 0.6 0.8 1.0Weight �
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
HTE
R�
(%)
(a) HTER!
0.0 0.2 0.4 0.6 0.8 1.0Weight �
1
2
3
4
5
6
SFA
R(%
)
(b) SFAR
Fig. 15: EPSC curves to compare fused systems
while HTER! increases rapidly with ! and reaches up to 25%for some of the baseline systems, it increases very mildlyand does not exceed 4.1% for the fused systems. The majoraugmentation of the robustness to spoofing of the systems after
Current system for sharing ideas, knowledge, findings and results.5 / 46
My results are better than yours !
Comparing to prior work ?
6 / 46
My results are better than yours !
May be trivial or easy
The pseudo-code is inside the paper !
7 / 46
My results are better than yours !
May be trivial or easyThe pseudo-code is inside the paper !
7 / 46
My results are better than yours !
May be not that easy
You just need 4 M face images to train 120 M parameters !
8 / 46
My results are better than yours !
May be not that easyYou just need 4 M face images to train 120 M parameters !
8 / 46
My results are better than yours !
May be not that easy
You just need the person who wrote the paper !
9 / 46
My results are better than yours !
May be not that easyYou just need the person who wrote the paper !
9 / 46
Scientific credibility and reproducibility
An article about computational science in a scientificpublication is not the scholarship1itself, it is merelyadvertising of the scholarship. The actual scholarship isthe complete software development environment and thecomplete set of instructions which generated thefigures.
D. Donoho,“ An invitation to reproducible computational research”,
Oxford Journals, Biostatistics,Vol. 11, no. 3, pp. 385-388, 2010
1Knowledge resulting from study and research in a particular field10 / 46
Scientific credibility and reproducibility
Scientific misconductIt has recently been shown by MIT researchers that the reviewingprocess that determines article acceptance in some conferences andjournals may by tricked by publications with machine generatedcontent.2
arXiv vs snarXiv: guess the real paper !
http://snarxiv.org/vs-arxiv/
“Exploring S-duality in Models of Spacetime Foam” vs “SuperWIMPCosmology and Collider Physics” ?
2http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763
11 / 46
Scientific credibility and reproducibility
Scientific misconductIt has recently been shown by MIT researchers that the reviewingprocess that determines article acceptance in some conferences andjournals may by tricked by publications with machine generatedcontent.2
arXiv vs snarXiv: guess the real paper !
http://snarxiv.org/vs-arxiv/
“Exploring S-duality in Models of Spacetime Foam” vs “SuperWIMPCosmology and Collider Physics” ?
2http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763
11 / 46
Scientific credibility and reproducibility
Currentlydata sets, code and actionable software are excluded uponrecording and preservation of articles
It slows down potential scientific development in at least twomajor aspects
• re-using ideas from different sources normally implies there-development of software leading to original results
• the reviewing process of candidate ideas is based on trustrather than on hard, verifiable evidence that can bethoroughly analyzed.
12 / 46
Data, Software and Competitions
Public biometric datasetsFERET, FRGC, NIST DBs, XM2VTS, BANCA, MOBIO,CASIA-FASD, REPLAY, MSU MFSD, ...
Open source softwareRAVL, VXL, OpenCV, Torch, CSU FR, BOB, OpenBR, Theanno,...
CompetitionsICB, BTAS, IJCB, NIST, FVCongoing, ...
Platformsto distribute data, share results, organize competitions
13 / 46
Existing platforms
Biometrics Ideal Test (China)
http://biometrics.idealtest.org
distribution of databases and submission of executables
14 / 46
Existing platforms
FVC-onGoing (Italy)
https://biolab.csr.unibo.it/FVCOnGoing
mostly evaluation of fingerprint recognition algorithms andsubmission of executables
15 / 46
Existing platforms
NIST iVector (US)
https://ivectorchallenge.nist.gov
specific feature vectors distributed and submission of results
16 / 46
Existing platforms
Kaggle
https://www.kaggle.com
propose a competition + award, download data, and submit results
17 / 46
Existing platformsKaggle
https://www.kaggle.com
propose a competition + award, download data, and submit results
company: what is the business model ?17 / 46
Existing platforms
Codalab
https://www.codalab.org
provide databases, program and execute research pipelines(Worksheets), Microsoft Outercurve Foundation
18 / 46
Existing platformsCodalab
https://www.codalab.org
provide databases, program and execute research pipelines(Worksheets), Microsoft Outercurve Foundation
data protection regulations (IRB) and IPR ?18 / 46
Existing platforms
Elsevier SoftwareX
http://www.journals.elsevier.com/softwarex/
only archiving of code on gitlab
19 / 46
Concept of “Executable Paper”Elsevier Executable Paper Grand Challenge (2011)
“a contest created to improve the way scientificinformation is communicated and used. The purpose ofthe Executable Paper Challenge is to invite scientists toput forth their ideas pertaining to these pressing andunsolved questions.”
• How can we develop a model for executable filesthat is compatible with the user’s operating systemand architecture and adaptable to future systems ?
• How do we manage very large file sizes ?• How do we validate data and code, and decrease
the reviewer’s workload ?• How to support registering and tracking of actions
taken on the “executable paper” ?
http://www.executablepapers.com 20 / 46
Enter “Reproducible Research” (RR)3
One term that aggregates work comprising of:
• a paper, that describe your work in all relevant details• code to reproduce all results• data required to reproduce the results• instructions, on how to apply the code on the data to
replicate the results on the paper.
3http://reproducibleresearch.net21 / 46
Levels of Reproducibility4
With respect to an independent researcher (reader):
0 Irreproducible1 Cannot seem to reproduce2 Reproducible, with extreme effort (> 1 month)3 Reproducible, with considerable effort (> 1 week)4 Easily reproducible (∼ 15 min.), but requires proprietary
software (e.g. Matlab)5 Easily reproducible (∼ 15 min.), only free software
4Reproducible Research in Signal Processing: What, why and how,Vandewalle, Kovacevic and Vetterli, IEEE Signal Processing Magazine, vol. 26,no. 3, May 2009, pp. 37-47
22 / 46
Reader – Writer interaction
23 / 46
Sadly
writing and distributing code and data takes time ...
24 / 46
Incentive
Boost your research impact (visibility)
• Lower entrance barrier to your publications• The current number of RR papers is rather small – you have
a clear chance to stand out today:• Only 10% of TIP papers provide source code5.
• Statistically, your work is more valuable if it is RR:• 13 out of the top 15 most cited articles in TPAMI or TIP
provide (at least) source code• The average number of citations for papers that provide
source-code in TIP is 7 fold that of papers that don’t.
5Code Sharing is Associated with Research Impact in Image Processing,Patrick Vandewalle, 2012
25 / 46
What can be improved ?
• Downloading and storing data may be a privacy concern inmany countries:
• Need to work-out space for the growing number of samples• Not all databases are distributable (e.g. forensic data)
• Software management and installation can be hard• Software gets outdated: constant quality and integration• Plan for errors: re-distribution mechanism
• Computing can be limited
26 / 46
Pushing RR to the next levelFrom the results in a paper
Method FAR (FMR) FRR (FNMR) HTERISV 0.178% 0.228% 0.203%
to the same results on an trusted third-party
Just by clicking on an attestation27 / 46
Moving to BEAT: A web platform for RR
• Accessible: no need to install extra software• Intuitive: graphically connect blocks to run experiments• Social: engagement gets you more processing power• Productive: search prior results by any filtering criteria• Data Privacy:
• No need to handle large-scale databases• Can run on un-distributable data (e.g. proprietary databases)
• Assurance:• fair (reproducible) evaluations of algorithms• online attestations for all produced results
• Free: build on open-source software and standards
28 / 46
BEAT platform: front-page
29 / 46
BEAT platform: dashboard
List, search, run experiments and more30 / 46
BEAT platform: databases
Privacy-by-design 31 / 46
BEAT platform: teams
Grouping users for labs, competitions or industrial projects
32 / 46
BEAT platform: experiments cloning
33 / 46
BEAT platform: experiments re-run
34 / 46
BEAT platform: attestations
Certify published results35 / 46
BEAT platform (beta)Open to public now !
http://www.beat-eu.org/platform
A “cloud computing” platform for easy online access toexperimentation and testing for Biometrics and beyond !
36 / 46
Pushing RR to
the next level ?0 Irreproducible1 Cannot seem to reproduce2 Reproducible, with extreme effort (> 1 month)3 Reproducible, with considerable effort (> 1 week)4 Easily reproducible (∼ 15 min.), but requires proprietary
software (e.g. Matlab)5 Easily reproducible (∼ 15 min.), only free software6 Easily reproducible (∼ 1 min.), only with a web-browser
37 / 46
BEAT: A web platform for RR
• Accessible: no need to install extra software• Intuitive: graphically connect blocks to run experiments• Social: engagement gets you more processing power• Productive: search prior results by any filtering criteria• Data Privacy:
• No need to handle large-scale databases• Can run on un-distributable data (e.g. proprietary databases)
• Assurance:• fair (reproducible) evaluations of algorithms• online attestations for all produced results
• Free: build on open-source software and standards
38 / 46
Next
Tutorial on BEAT at BTAS 2015• Introduction, motivation, requirements and design of the
BEAT platform• Exploring existing components at the BEAT platform• Registered user interaction; Adding new components to the
BEAT platform
39 / 46
Next
Final release by Jan 2016
• Reputation system to gamify the platform• Paper generator to export tables, figures (and its data) into
re-usable material for publications (LATEX)• Remote Software Development Kit (SDK)
The future of the BEAT platform
• Host more biometric databases• Organize competitions• Multiple backends: GPU, executables, MATLAB• Install the platform in different institutions with different
databases
40 / 46
ThanksTribute toResearchers:
• Andre Anjos and Laurent El-Shafey• Tiago de Freitas Pereira, Manuel Gunther, Elie Khoury
Engineers:• Philip Abbet, Samuel Gaist, Flavio Tarsetti
Partners of the EU BEAT project (www.beat-eu.org)
• Universidad Autonoma de Madrid (UAM)• University of Surrey (UNIS)• Safran Morpho• TUViT• Katholieke Universiteit Leuven
41 / 46
Thank you for your attention
Sebastien Marcel
Idiap Research InstituteSwitzerland
www.idiap.ch/˜marcel
BEATBiometrics Evaluation and Testingwww.beat-eu.org/platformgroups.google.com/d/forum/beat-devel
Swiss Biometrics CenterSwiss Center for Biometrics Research andTestingwww.biometrics-center.ch
42 / 46
BEAT: A web platform for RR
Research pipeline = Toolchains + Blocks
43 / 46
BEAT: A web platform for RRBlocks
InputAPI
OutputAPI
Inputs:Each one acceptsone data format
Outputs:Each one producesone data format
Configuration:● Parameter #1● Parameter #2● ...● Parameter #N
Algorithm
Storage
Data Data Data
Block
44 / 46
BEAT: A web platform for RR
Example of toolchain: Eigenfaces
45 / 46
BEAT platform (beta)
Computing and StorageChassis: IBM Flex System Enterprise Chassis
Dual Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (20 cores per node)256GB RAM, 2 x 10GbE
HP Procurve 8212zl Backbone (10 GbE)
NetApp 3220 dual head network20 TB (mirrored Hybrid SSD/SAS)10 TB usable capacity | 4 x 10 GbE
80 cores (12 Gb RAM/core) and 20 Tb of Cache
46 / 46