reproducible research in biometrics - moving to the beat...

1 / 46

Reproducible research in BiometricsMoving to the BEAT

Sebastien MarcelIdiap research institute

Switzerlandwww.idiap.ch/˜marcel

International Conference on Biometrics (ICB)Phuket – May 21, 2015

2 / 46

www.idiap.ch/~marcel

Then a miracle occurs !

How many times?

3 / 46


Crossed a publication and openly decided to ignore it because itwould be too hard to apply it on your research?

3 / 46


Worked day and night to incorporate some results on your ownwork but:

• There were untold parameters that needed adjustment andyou couldn’t get hold of them?

• Realized the proposed algorithm worked only on the specificdata shown at the original paper?

• Realized that something did not quite add up in the end?

3 / 46

Don’t touch anything !

Had to take over the work from a student or another colleaguethat left and had to start from scratch - months into programmingto make things work again?

4 / 46

Don’t touch anything !

Would have liked to replay to someone about your work, butyou couldn’t really remember all details when you first made itwork? Or you could not make it work at all?

4 / 46

Scientific publicationsSUPPLEMENTAL MATERIAL 1

A Scalable Formulation of Probabilistic LinearDiscriminant Analysis: Applied to Face Recognition

Laurent El Shafey, Chris McCool, Roy Wallace, and Sebastien Marcel

APPENDIX AMATHEMATICAL DERIVATIONS

The goal of the following section is to provide more detailedproofs of the formulae given in the article for both trainingand computing the likelihood.

The following proofs make use of a formulation of theinverse of a block matrix that uses the Schur complement.The corresponding identity can be found in [1] (Equations1.11 and 1.10),

L MN O

��1

=

R, �RMO�1

�O�1NR, O�1 + O�1NRMO�1

�, (51)

where we have substituted R =�L � MO�1N

��1.

Another related expression is the Woodbury matrix identity(Equation C.7 of [2]), which states that,

(L + MON)�1

=

L�1 � L�1M�O�1 + NL�1M

��1NL�1.

(52)

A. Scalable training

The bottleneck of the training procedure is the expectationstep (E-Step) of the Expectation-Maximization algorithm. ThisE-Step requires the computation of the first and second ordermoments of the latent variables.

1) Estimating the first order moment of the Latent Vari-ables: The most computationally expensive part when es-timating the latent variables is the inversion of the matrix˚P (Equation (27)). This matrix is block diagonal, the twoblocks being P0 (Equation (28)) and (a repetition of) P1

(Equation (29)),

˚P =

266664

P0 0 · · · 0

0 P1. . . 0

0. . . . . . 0

0 · · · 0 P1

377775

. (53)

The inverse of P1 is equal to the matrix G, defined by(30). This matrix is of constant size (DG ⇥ DG), irrespective

L. El Shafey is with Idiap Research Institute and Ecole PolytechniqueFederale de Lausanne, Switzerland e-mail: [email protected]

C. McCool, R. Wallace and S. Marcel are withIdiap Research Institute, Martigny, Switzerland e-mail:{christopher.mccool,roy.wallace,sebastien.marcel}@idiap.ch

The research leading to these results has received funding from theEuropean Communitys Seventh Framework Programme (FP7) under grantagreements 238803 (BBfor2) and 257289 (TABULA RASA).

of the number of training samples for the class. In addition,the inversion of P0 can be further optimised using the blockmatrix inversion identity introduced at the beginning of thissection, leading to

P�10 =

"FJi

pJiHT

pJiH

⇣IDG

� JiHF T⌃�1G⌘

G

#, (54)

where FJiis defined by (33) and H by (37).

Then, the computation of ˚P�1˚AT ⌃�1

gives a block diag-onal matrix, the first block being

" pJiFJi

F T SGGT⌃�1

⇣IDx

� JiFFJiF T S

⌘#

,

and the other ones being equal to GGT⌃�1.As explained in section III.B.a of the article, hi corresponds

to the upper sub-vector of ˚yi and is not affected by the changeof variable, as depicted in (21). Therefore, the first ordermoment of hi is directly obtained by multiplying the firstblock-rows of the matrix ˚P�1˚AT ⌃

�1with ˚xi, which gives

(31).Considering only the ˚wi (lower) sub-vector of ˚yi, the

corresponding (lower) part ˚B of the matrix ˚P�1˚AT ⌃�1

canbe decomposed into a sum of two matrices, the first onebeing sparse with a single non-zero block (upper left) equalto B0 = �JiGGT⌃�1FFJiF

T S, and the second one beingdiagonal by blocks with identical blocks B1 = GGT⌃�1,

˚B =

24

B0 0 00 0 00 0 0

35 +

264

B1 0 0

0. . . 0

0 0 B1

375 . (55)

Furthermore, the first order moment of the variables wi isgiven by

E [wi|xi,⇥] =⇣U

T ⌦ IDG

⌘24

B0 0 00 0 00 0 0

35˚xi (56)

+⇣U

T ⌦ IDG

⌘264

B1 0 0

0. . . 0

0 0 B1

375⇣U ⌦ IDx

⌘xi.

The previous decomposition greatly simplifies the compu-tation, and leads to the following expression for each wi,j ,

E [wi,j |xi,⇥] = GGT⌃�1xi,j

� GGT⌃�1FFJiF T S

X

j

xi,j (57)

JOURNAL TO BE DEFINED, VOL. X, NO. T, JANUARY 2013 12

-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0Verification Scores

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Nor

mal

ized

Cou

nt

0

20

40

60

80

100

SFA

R(%

)

91.5%

(a) GMM

1.0 1.5 2.0 2.5 3.0 3.5Verification Scores

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Nor

mal

ized

Cou

nt

0

20

40

60

80

100

SFA

R(%

)

88.5%

(b) LGBPHS

0.4 0.5 0.6 0.7 0.8 0.9 1.0Verification Scores

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

Nor

mal

ized

Cou

nt

0

20

40

60

80

100

SFA

R(%

)

95.0%

(c) GGG

-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0Verification Scores

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Nor

mal

ized

Cou

nt

0

20

40

60

80

100

SFA

R(%

)

92.6%

(d) ISV

Fig. 10: Score distributions of baseline face verification systems. The full green line shows how SFAR changes with movingthe threshold.

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

5

10

15

20

25

30

HTE

R�

(%)

(a) HTER! , GMM

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

5

10

15

20

25

HTE

R�

(%)

(b) HTER! , LGBPHS

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

2

4

6

8

10

12

14

16

18

HTE

R�

(%)

(c) HTER! , GGG

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

5

10

15

20

25

HTE

R�

(%)

(d) HTER! , ISV

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

20

40

60

80

100

SFA

R(%

)

(e) SFAR, GMM

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

10

20

30

40

50

60

70

80

90

SFA

R(%

)

(f) SFAR, LGBPHS

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

20

40

60

80

100

SFA

R(%

)

(g) SFAR, GGG

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0

20

40

60

80

100

SFA

R(%

)

(h) SFAR, ISV

Fig. 12: EPSC for comparison of fusion techniques of baselines with LBP anti-spoofing algorithm

D. Performance of fused systems

In our last experiment, we compare the four face verificationsystems when fused with ALL counter-measures using PLRfusion scheme. Firstly, we illustrate how fusion changes thescore distribution for each of them separately in Figure 14.Then, in Figure 15 we compare which of the fused systemsperforms the best.

While Figure 10 shows that the spoofing attacks of Replay-Attack are in the optimal category when fed to the baselineface verification systems, Figure 14 illustrates that their effec-tiveness has vastly changed after fusion. The score distributionof the spoofing attacks is now mostly overlapping with thescore distribution of the zero-effort impostors, allowing forbetter discriminability between the positive class and the twonegative classes. The results are reflecting this observation:even when the threshold is obtained using the licit scenario,SFAR has dropped to less then 6%.

The comparison between the EPSC curves given in Fig-ure 11(a) and Figure 15(a), confirms the above observations:

0.0 0.2 0.4 0.6 0.8 1.0Weight �

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

HTE

R�

(%)

(a) HTER!

0.0 0.2 0.4 0.6 0.8 1.0Weight �

1

2

3

4

5

6

SFA

R(%

)

(b) SFAR

Fig. 15: EPSC curves to compare fused systems

while HTER! increases rapidly with ! and reaches up to 25%for some of the baseline systems, it increases very mildlyand does not exceed 4.1% for the fused systems. The majoraugmentation of the robustness to spoofing of the systems after

Current system for sharing ideas, knowledge, findings and results.5 / 46

My results are better than yours !

Comparing to prior work ?

6 / 46


May be trivial or easy

The pseudo-code is inside the paper !

7 / 46


May be trivial or easyThe pseudo-code is inside the paper !

7 / 46


May be not that easy

You just need 4 M face images to train 120 M parameters !

8 / 46


May be not that easyYou just need 4 M face images to train 120 M parameters !

8 / 46


May be not that easy

You just need the person who wrote the paper !

9 / 46


May be not that easyYou just need the person who wrote the paper !

9 / 46

Scientific credibility and reproducibility

An article about computational science in a scientificpublication is not the scholarship1itself, it is merelyadvertising of the scholarship. The actual scholarship isthe complete software development environment and thecomplete set of instructions which generated thefigures.

D. Donoho,“ An invitation to reproducible computational research”,

Oxford Journals, Biostatistics,Vol. 11, no. 3, pp. 385-388, 2010

1Knowledge resulting from study and research in a particular field10 / 46


Scientific misconductIt has recently been shown by MIT researchers that the reviewingprocess that determines article acceptance in some conferences andjournals may by tricked by publications with machine generatedcontent.2

arXiv vs snarXiv: guess the real paper !

http://snarxiv.org/vs-arxiv/

“Exploring S-duality in Models of Spacetime Foam” vs “SuperWIMPCosmology and Collider Physics” ?

2http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763

11 / 46

http://snarxiv.org/vs-arxiv/

http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763

http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763


Currentlydata sets, code and actionable software are excluded uponrecording and preservation of articles

It slows down potential scientific development in at least twomajor aspects

• re-using ideas from different sources normally implies there-development of software leading to original results

• the reviewing process of candidate ideas is based on trustrather than on hard, verifiable evidence that can bethoroughly analyzed.

12 / 46

Data, Software and Competitions

Public biometric datasetsFERET, FRGC, NIST DBs, XM2VTS, BANCA, MOBIO,CASIA-FASD, REPLAY, MSU MFSD, ...

Open source softwareRAVL, VXL, OpenCV, Torch, CSU FR, BOB, OpenBR, Theanno,...

CompetitionsICB, BTAS, IJCB, NIST, FVCongoing, ...

Platformsto distribute data, share results, organize competitions

13 / 46

Existing platforms

Biometrics Ideal Test (China)

http://biometrics.idealtest.org

distribution of databases and submission of executables

14 / 46

http://biometrics.idealtest.org

Existing platforms

FVC-onGoing (Italy)

https://biolab.csr.unibo.it/FVCOnGoing

mostly evaluation of fingerprint recognition algorithms andsubmission of executables

15 / 46

https://biolab.csr.unibo.it/FVCOnGoing

Existing platforms

NIST iVector (US)

https://ivectorchallenge.nist.gov

specific feature vectors distributed and submission of results

16 / 46

https://ivectorchallenge.nist.gov

Existing platforms

Kaggle

https://www.kaggle.com

propose a competition + award, download data, and submit results

17 / 46


Existing platformsKaggle


propose a competition + award, download data, and submit results

company: what is the business model ?17 / 46


Existing platforms

Codalab

https://www.codalab.org

provide databases, program and execute research pipelines(Worksheets), Microsoft Outercurve Foundation

18 / 46


Existing platformsCodalab


provide databases, program and execute research pipelines(Worksheets), Microsoft Outercurve Foundation

data protection regulations (IRB) and IPR ?18 / 46


Existing platforms

Elsevier SoftwareX

http://www.journals.elsevier.com/softwarex/

only archiving of code on gitlab

19 / 46

http://www.journals.elsevier.com/softwarex/

Concept of “Executable Paper”Elsevier Executable Paper Grand Challenge (2011)

“a contest created to improve the way scientificinformation is communicated and used. The purpose ofthe Executable Paper Challenge is to invite scientists toput forth their ideas pertaining to these pressing andunsolved questions.”

• How can we develop a model for executable filesthat is compatible with the user’s operating systemand architecture and adaptable to future systems ?

• How do we manage very large file sizes ?• How do we validate data and code, and decrease

the reviewer’s workload ?• How to support registering and tracking of actions

taken on the “executable paper” ?

http://www.executablepapers.com 20 / 46

http://www.executablepapers.com

Enter “Reproducible Research” (RR)3

One term that aggregates work comprising of:

• a paper, that describe your work in all relevant details• code to reproduce all results• data required to reproduce the results• instructions, on how to apply the code on the data to

replicate the results on the paper.

3http://reproducibleresearch.net21 / 46

http://reproducibleresearch.net

Levels of Reproducibility4

With respect to an independent researcher (reader):

0 Irreproducible1 Cannot seem to reproduce2 Reproducible, with extreme effort (> 1 month)3 Reproducible, with considerable effort (> 1 week)4 Easily reproducible (∼ 15 min.), but requires proprietary

software (e.g. Matlab)5 Easily reproducible (∼ 15 min.), only free software

4Reproducible Research in Signal Processing: What, why and how,Vandewalle, Kovacevic and Vetterli, IEEE Signal Processing Magazine, vol. 26,no. 3, May 2009, pp. 37-47

22 / 46

Reader – Writer interaction

23 / 46

Sadly

writing and distributing code and data takes time ...

24 / 46

Incentive

Boost your research impact (visibility)

• Lower entrance barrier to your publications• The current number of RR papers is rather small – you have

a clear chance to stand out today:• Only 10% of TIP papers provide source code5.

• Statistically, your work is more valuable if it is RR:• 13 out of the top 15 most cited articles in TPAMI or TIP

provide (at least) source code• The average number of citations for papers that provide

source-code in TIP is 7 fold that of papers that don’t.

5Code Sharing is Associated with Research Impact in Image Processing,Patrick Vandewalle, 2012

25 / 46

What can be improved ?

• Downloading and storing data may be a privacy concern inmany countries:

• Need to work-out space for the growing number of samples• Not all databases are distributable (e.g. forensic data)

• Software management and installation can be hard• Software gets outdated: constant quality and integration• Plan for errors: re-distribution mechanism

• Computing can be limited

26 / 46

Pushing RR to the next levelFrom the results in a paper

Method FAR (FMR) FRR (FNMR) HTERISV 0.178% 0.228% 0.203%

to the same results on an trusted third-party

Just by clicking on an attestation27 / 46

https://www.beat-eu.org/platform/attestations/1721510690

Moving to BEAT: A web platform for RR

• Accessible: no need to install extra software• Intuitive: graphically connect blocks to run experiments• Social: engagement gets you more processing power• Productive: search prior results by any filtering criteria• Data Privacy:

• No need to handle large-scale databases• Can run on un-distributable data (e.g. proprietary databases)

• Assurance:• fair (reproducible) evaluations of algorithms• online attestations for all produced results

• Free: build on open-source software and standards

28 / 46

BEAT platform: front-page

29 / 46

BEAT platform: dashboard

List, search, run experiments and more30 / 46

BEAT platform: databases

Privacy-by-design 31 / 46

BEAT platform: teams

Grouping users for labs, competitions or industrial projects

32 / 46

BEAT platform: experiments cloning

33 / 46

BEAT platform: experiments re-run

34 / 46

BEAT platform: attestations

Certify published results35 / 46

BEAT platform (beta)Open to public now !

http://www.beat-eu.org/platform

A “cloud computing” platform for easy online access toexperimentation and testing for Biometrics and beyond !

36 / 46

http://www.beat-eu.org/platform

Pushing RR to

the next level ?0 Irreproducible1 Cannot seem to reproduce2 Reproducible, with extreme effort (> 1 month)3 Reproducible, with considerable effort (> 1 week)4 Easily reproducible (∼ 15 min.), but requires proprietary

software (e.g. Matlab)5 Easily reproducible (∼ 15 min.), only free software6 Easily reproducible (∼ 1 min.), only with a web-browser

37 / 46

BEAT: A web platform for RR

• Accessible: no need to install extra software• Intuitive: graphically connect blocks to run experiments• Social: engagement gets you more processing power• Productive: search prior results by any filtering criteria• Data Privacy:

• No need to handle large-scale databases• Can run on un-distributable data (e.g. proprietary databases)

• Assurance:• fair (reproducible) evaluations of algorithms• online attestations for all produced results

• Free: build on open-source software and standards

38 / 46

Next

Tutorial on BEAT at BTAS 2015• Introduction, motivation, requirements and design of the

BEAT platform• Exploring existing components at the BEAT platform• Registered user interaction; Adding new components to the

BEAT platform

39 / 46

Next

Final release by Jan 2016

• Reputation system to gamify the platform• Paper generator to export tables, figures (and its data) into

re-usable material for publications (LATEX)• Remote Software Development Kit (SDK)

The future of the BEAT platform

• Host more biometric databases• Organize competitions• Multiple backends: GPU, executables, MATLAB• Install the platform in different institutions with different

databases

40 / 46

ThanksTribute toResearchers:

• Andre Anjos and Laurent El-Shafey• Tiago de Freitas Pereira, Manuel Gunther, Elie Khoury

Engineers:• Philip Abbet, Samuel Gaist, Flavio Tarsetti

Partners of the EU BEAT project (www.beat-eu.org)

• Universidad Autonoma de Madrid (UAM)• University of Surrey (UNIS)• Safran Morpho• TUViT• Katholieke Universiteit Leuven

41 / 46

www.beat-eu.org

Thank you for your attention

Sebastien Marcel

Idiap Research InstituteSwitzerland

www.idiap.ch/˜marcel

BEATBiometrics Evaluation and Testingwww.beat-eu.org/platformgroups.google.com/d/forum/beat-devel

Swiss Biometrics CenterSwiss Center for Biometrics Research andTestingwww.biometrics-center.ch

42 / 46

www.idiap.ch/~marcel

www.beat-eu.org/platform

groups.google.com/d/forum/beat-devel

www.biometrics-center.ch


Research pipeline = Toolchains + Blocks

43 / 46

BEAT: A web platform for RRBlocks

InputAPI

OutputAPI

Inputs:Each one acceptsone data format

Outputs:Each one producesone data format

Configuration:● Parameter #1● Parameter #2● ...● Parameter #N

Algorithm

Storage

Data Data Data

Block

44 / 46


Example of toolchain: Eigenfaces

45 / 46

BEAT platform (beta)

Computing and StorageChassis: IBM Flex System Enterprise Chassis

Dual Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (20 cores per node)256GB RAM, 2 x 10GbE

HP Procurve 8212zl Backbone (10 GbE)

NetApp 3220 dual head network20 TB (mirrored Hybrid SSD/SAS)10 TB usable capacity | 4 x 10 GbE

80 cores (12 Gb RAM/core) and 20 Tb of Cache

46 / 46

reproducible research in biometrics - moving to the beat...

Documents