cosmology with bayesian statistics and information theory€¦ · review: leclercq et al. 2016,...

12
Cosmology with Bayesian statistics and information theory Lecture 3: Information theory … a.k.a. how much is there to be learned in my data anyway? Florent Leclercq Institute of Cosmology and Gravitation, University of Portsmouth http://icg.port.ac.uk/~leclercq / March 10 th , 2017 1

Upload: others

Post on 19-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Cosmology with Bayesian statisticsand information theory

Lecture 3: Information theory

… a.k.a. how much is there to be learned in my data anyway?

Florent LeclercqInstitute of Cosmology and Gravitation, University of Portsmouth

http://icg.port.ac.uk/~leclercq/

March 10th, 2017

1

Page 2: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

The noisy binary symmetric channel

2

https://xkcd.com/552/

Source EncoderNoisy

channelDecoder

Rate of information transfer:

Page 3: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

The R3 code

3MacKay 2003, chap. 1

Page 4: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

• Introducing the concept of :

The (7,4) Hamming code: Encoder

4MacKay 2003, chap. 1

Page 5: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

• Introducing the concept of :

• Error correction:

• Unsuccessful error correction:

The (7,4) Hamming code: Decoder

5MacKay 2003, chap. 1

parity-checks

syndrome

identity

Page 6: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Low-density parity check (LDPC) codes

• Tanner graph:

• N bits, M=(1-R)N parity-check constraints

• 2RN possible “words”: the

• A parity-check matrix

• Decoding LDPC codes: the general theory borrows from

: Ising spins in interaction and the BP mean field approximation (Bethe-Peierls – Belief Propagation).

6MacKay 2003, chap. 47

1 4 75 62 3 8bits

parity constraints a b c da b c d

Page 7: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Measures of entropy and information

:

:

:

• Properties:

• chain rule:

• “Bayes’s theorem”:

7Reviews: Crooks 2016; Leclercq et al. 2016, arXiv:1606.06758 (Appendix A)

Page 8: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Measures of entropy and information

:

• Properties:

• Consequence:

8Reviews: Crooks 2016; Leclercq et al. 2016, arXiv:1606.06758 (Appendix A)

/

/ :

• Properties:

• Gibbs’s inequality:

• Relation to mutual information:

Page 9: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Measures of entropy and information

:

• Jeffreys symmetrization:

• Jensen symmetrization:

:

:

• Properties:

9Reviews: Crooks 2016; Leclercq et al. 2016, arXiv:1606.06758 (Appendix A)

Page 10: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Information-theoretic experimental design

: maximize

:

• Maximal information gain:

• A-optimality:

• D-optimality:

10Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B)

(expected) data experimental design

Page 11: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Information-theoretic experimental design

:

• Maximal Bayes factor:

• Maximal mutual information between the model indicator and the data:

• Maximal Jensen-Shannon divergence between posterior predictive distributions:

:

i.e. the “supervised machine learning” utility:

11

It is hard to predict Bayes factors…but see

Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B)

Trotta 2007, arXiv:astro-ph/0703063

e.g. Cavagnaro et al. 2010

Vanlier et al. 2014, FL, Lavaux, Jasche & Wandelt 2016, arXiv:1606.06758

Page 12: Cosmology with Bayesian statistics and information theory€¦ · Review: Leclercq et al. 2016, arXiv:1606.06758 (Appendix B) 10 (expected) data experimental design. Information-theoretic

Supervised machine learning basics

• How to compute the information gain?

12

parent entropy:

child2 entropy:

child1 entropy:

weighted average entropy of children:

information gain for this split: