meet-up 2019 | doctorants & industrie controlled ... · meet-up 2019 | doctorants &...

1
Meet-up | Doctorants & Industrie CONTROLLED REPRESENTATION LEARNING: TOWARDS MORE RELIABLE MACHINE LEARNING Victor BOUVIER , Céline Hudelot , Myriam Tami , Clément Chastagnol , Philippe Very CentraleSupélec, Université Paris-Saclay Sidetrade Interface - Digicosme CentraleSupelec Victor Bouvier- [email protected] CIFRE with Sidetrade Abstract Provided that a large amount of annotated data exists, deep learning allows valuable information to be derived from high dimensional data, such as text or images, opening the way to a wide range of applications in many industrial sectors. The main ingredient of this success is its ability to extract and infer on a huge amount of weak statistical patterns. Quite surprisingly, this strength can become a weakness in an industrial context; many factors may change in a production en- vironment and their impact on deep models remains poorly understood. The research hypothesis of this thesis work is to search for stable properties under varying modalities. The promise of this approach is to infer on an intrinsic property of the task rather than on spurious, or too specic, statistical patterns. We studied the connections of our research hypothesis with domain adaptation - a well-established line of study which uses unlabelled data to adapt models to new environments. We have shown that our new point of view makes it possible to formulate a generic way of dealing with adaptation. Our claim is supported both theoretically and empirically and adds a step towards unifying two lines of study (Importance Sam- pling and Invariant Representation) by discussing the role of compression of the representations. In future work, we want to exhibit the limit of domain invariant representations for learning invariant predictors. More precisely, we believe that domain invariance is insucient and provides only very sparse information (binary) to identify stable properties in the data. To enrich information, we aim to develop Active Learning strategies to quickly and cost-eectively identify sources of spu- rious correlations in a context where unlabelled production data are available. Ultimately, we wish to connect our research with counterfactual analysis, which has strategic applications in industry, particularly in decision-making. The potential applications of our research are extensive and include Transfer Learning, Robustness, Privacy and Fairness. THE CURSE OF SUPERVISED LEARNING High specialization degrades True Generalization Figure : Recognition in Terra Incognita [] • Deep models learn spurious correlation •Not robust to new environments or domains Providing reliable model in production is a major challenge in Machine Learning CONTROLLED REPRESENTATION LEARNING Y X 1 X p ··· X 1 X p ··· ˆ Y Y = Y X 1 X p ··· S X p ··· ˆ Y Y X 1 = Specialized prediction Controlled prediction Hand- curated data Control Learning Controlled Representation Data Training environments Production environment Data Figure : Supervised Learning VS Controlled Representation Learning • Source environment (S ): prior knowledge What a reliable model must remain insensitive to? E[Y |Z ] is invariant w.r.t. S . []. CONTRIBUTIONS . Is a representation controlled? [] •Filtering strategy for control testing •Generic method for introducing bias in dataset . How to learn a controlled representation? [] • Supervised and Unsupervised manner •Binary source case: have demonstrated better generalization properties than traditional meth- ods. . What is a good controlled representation? [] •The representations that best preserve the origi- nal feature space have the best guarantees • Role of Weighted Representations APPLICATIONS S = {source, target} S = {female, male} Domain Adaptation Fair ML E source [Y |Z ]= E target [Y |Z ] E male [Y |Z ]= E female [Y |Z ] Find a representation Z = '(X ) such that: FUTURE WORK . Theoretical fundations of CRL .How to select controlled representations? . Beyond the binary source case . How to actively select samples? . Open source python library from csl import InvariantConditional X, Y, S = data() model = InvariantConditional() model.fit(features=X, labels=Y, sources=S) Z = model.control(X) References [] Anonymous. Domain-invariant representations: A look on compression and weights. In Submitted to International Conference on Learning Representations, . under review. [] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:., . []Sara Beery, Grant Van Horn, and Pietro Perona. Recognition in terra incognita. In Proceedings of the European Conference on Computer Vision (ECCV), pages –, . [] Victor Bouvier, Philippe Very, Céline Hudelot, and Clément Chastagnol. Hidden covariate shift: A minimal assumption for domain adaptation. arXiv preprint arXiv:., . []Victor Bouvier, Philippe Very, Céline Hudelot, and Clément Chastagnol. Learning invariant representa- tions for sentiment analysis: The missing material is datasets. arXiv preprint arXiv:., . Acknowledgements We would like to thank Sidetrade and ANRT for fundings.

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meet-up 2019 | Doctorants & Industrie CONTROLLED ... · Meet-up 2019 | Doctorants & Industrie CONTROLLED REPRESENTATION LEARNING: TOWARDS MORE RELIABLE MACHINE LEARNING Victor BOUVIER1,2

Meet-up 2019 | Doctorants & IndustrieCONTROLLED REPRESENTATION LEARNING:TOWARDS MORE RELIABLE MACHINE LEARNINGVictor BOUVIER1,2

Céline Hudelot1, Myriam Tami1, Clément Chastagnol2, Philippe Very1 CentraleSupélec, Université Paris-Saclay 2 Sidetrade

Interface - DigicosmeCentraleSupelecVictor Bouvier- [email protected] with Sidetrade

AbstractProvided that a large amount of annotated data exists, deep learning allows valuable information to be derived fromhigh dimensional data, such as text or images, opening the way to a wide range of applications in many industrial sectors.The main ingredient of this success is its ability to extract and infer on a huge amount of weak statistical patterns. Quitesurprisingly, this strength can become a weakness in an industrial context; many factors may change in a production en-vironment and their impact on deep models remains poorly understood. The research hypothesis of this thesis work is tosearch for stable properties under varying modalities. The promise of this approach is to infer on an intrinsic property ofthe task rather than on spurious, or too specific, statistical patterns. We studied the connections of our research hypothesiswith domain adaptation - a well-established line of study which uses unlabelled data to adaptmodels to new environments.We have shown that our new point of view makes it possible to formulate a generic way of dealing with adaptation. Ourclaim is supported both theoretically and empirically and adds a step towards unifying two lines of study (Importance Sam-pling and Invariant Representation) by discussing the role of compression of the representations. In future work, we wantto exhibit the limit of domain invariant representations for learning invariant predictors. More precisely, we believe thatdomain invariance is insufficient and provides only very sparse information (binary) to identify stable properties in the data.To enrich information, we aim to develop Active Learning strategies to quickly and cost-effectively identify sources of spu-rious correlations in a context where unlabelled production data are available. Ultimately, we wish to connect our researchwith counterfactual analysis, which has strategic applications in industry, particularly in decision-making. The potentialapplications of our research are extensive and include Transfer Learning, Robustness, Privacy and Fairness.

1 THE CURSE OF SUPERVISED LEARNING

High specialization degrades True Generalization

Figure 1: Recognition in Terra Incognita [3]

•Deep models learn spurious correlation•Not robust to new environments or domainsProviding reliable model in production is a majorchallenge in Machine Learning

2 CONTROLLED REPRESENTATION LEARNING

YX1 Xp· · · X1 Xp· · · Y Y

6=

YX1 Xp· · · SXp· · · Y YX1

=

Specialized prediction

Controlled prediction

Hand-curated data

Control

Learning Controlled

Representation

Data

Training environments

Production environment

Data

Figure 2: Supervised Learning VS Controlled Representation Learning

•Source environment (S): prior knowledgeWhat a reliable model must remain insensitive to?•E[Y |Z] is invariant w.r.t. S. [2].

3 CONTRIBUTIONS

1. Is a representation controlled? [5]•Filtering strategy for control testing•Generic method for introducing bias in dataset2.How to learn a controlled representation? [4]•Supervised and Unsupervised manner•Binary source case: have demonstrated bettergeneralization properties than traditional meth-ods.3.What is a good controlled representation? [1]•The representations that best preserve the origi-nal feature space have the best guarantees•Role ofWeighted Representations

4 APPLICATIONS

S = {source, target} S = {female, male}

Domain Adaptation Fair ML

Esource[Y |Z] = Etarget[Y |Z] Emale[Y |Z] = Efemale[Y |Z]

Find a representation Z = '(X) such that:

5 FUTURE WORK

1.Theoretical fundations of CRL2.How to select controlled representations?3.Beyond the binary source case4.How to actively select samples?5.Open source python library1 from csl import InvariantConditional23 X, Y, S = data()45 model = InvariantConditional ()6 model.fit(features=X, labels=Y, sources=S)7 Z = model.control(X)

References[1] Anonymous. Domain-invariant representations: A look on compression and weights. In Submitted to

International Conference on Learning Representations, 2020. under review.[2] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv

preprint arXiv:1907.02893, 2019.[3] Sara Beery, Grant Van Horn, and Pietro Perona. Recognition in terra incognita. In Proceedings of the

European Conference on Computer Vision (ECCV), pages 456–473, 2018.[4] Victor Bouvier, Philippe Very, Céline Hudelot, and Clément Chastagnol. Hidden covariate shift: Aminimalassumption for domain adaptation. arXiv preprint arXiv:1907.12299, 2019.[5] Victor Bouvier, Philippe Very, Céline Hudelot, and Clément Chastagnol. Learning invariant representa-tions for sentiment analysis: The missing material is datasets. arXiv preprint arXiv:1907.12305, 2019.

AcknowledgementsWe would like to thank Sidetrade and ANRT for fundings.