displaying your data - stanford university€¦ · displaying your data practical computing for...

Post on 10-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Displaying your dataPractical Computing for Scientists

Physics 91SI

M. Bellis

Department of PhysicsStanford University

April 19th, 2011

M. Bellis April ’11 Data display 1 / 21

Outline

1 Understanding your dataMatplotlibDisplaying your data

2 Summary

M. Bellis April ’11 Data display 2 / 21

Introduction

• Research is usually about working with new data.

• Or about working with old data in a new way.

• It behooves you to understand your data.

M. Bellis April ’11 Data display 3 / 21

Data

• Given some distribution of data, look at:

• Mean (µ)• Standard deviation (σ)

• Make use of built-in functions in numpy.

• Try this with the lists of data you were sent.

legba:~> ipython

In [1]: from numpy import *

In [2]: x = [0, 1, 2, 3, 4]

In [3]: mean(x)

Out[3]: 2.0

In [4]: std(x)

Out[4]: 1.4142135623730951

M. Bellis April ’11 Data display 4 / 21

Data

• Given some distribution of data, look at:

• Mean (µ)• Standard deviation (σ)

• Make use of built-in functions in numpy.

• Try this with the lists of data you were sent.

legba:~> ipython

In [1]: from numpy import *

In [2]: x = [0, 1, 2, 3, 4]

In [3]: mean(x)

Out[3]: 2.0

In [4]: std(x)

Out[4]: 1.4142135623730951

M. Bellis April ’11 Data display 4 / 21

Data

• Given some distribution of data, look at:

• Mean (µ)• Standard deviation (σ)

• Make use of built-in functions in numpy.

• Try this with the lists of data you were sent.

legba:~> ipython

In [1]: from numpy import *

In [2]: x = [0, 1, 2, 3, 4]

In [3]: mean(x)

Out[3]: 2.0

In [4]: std(x)

Out[4]: 1.4142135623730951

M. Bellis April ’11 Data display 4 / 21

Data

• Given some distribution of data, look at:

• Mean (µ)• Standard deviation (σ)

• Make use of built-in functions in numpy.

• Try this with the lists of data you were sent.

legba:~> ipython

In [1]: from numpy import *

In [2]: x = [0, 1, 2, 3, 4]

In [3]: mean(x)

Out[3]: 2.0

In [4]: std(x)

Out[4]: 1.4142135623730951

M. Bellis April ’11 Data display 4 / 21

Data

• Given some distribution of data, look at:

• Mean (µ)• Standard deviation (σ)

• Make use of built-in functions in numpy.

• Try this with the lists of data you were sent.

legba:~> ipython

In [1]: from numpy import *

In [2]: x = [0, 1, 2, 3, 4]

In [3]: mean(x)

Out[3]: 2.0

In [4]: std(x)

Out[4]: 1.4142135623730951

M. Bellis April ’11 Data display 4 / 21

Data

• Given some distribution of data, look at:

• Mean (µ)• Standard deviation (σ)

• Make use of built-in functions in numpy.

• Try this with the lists of data you were sent.

legba:~> ipython

In [1]: from numpy import *

In [2]: x = [0, 1, 2, 3, 4]

In [3]: mean(x)

Out[3]: 2.0

In [4]: std(x)

Out[4]: 1.4142135623730951

M. Bellis April ’11 Data display 4 / 21

Matplotlib

Figure: http://matplotlib.sourceforge.net/

• Plotting library for Python.

• Original author: John Hunter• pyplot

• Included in pylab, along with numpy.

• pylab aims to be a replacement for MATLAB.

• So what does that give us?

M. Bellis April ’11 Data display 5 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

Coding

Import pylab and use the array object.

legba:~> ipython

In [1]: from pylab import *

In [2]: x = array([0,1,2,3,4])

In [3]: y = array([0,1,2,3,4])

In [4]: plot(x,y)

Out[4]: [<matplotlib.lines.Line2D object at 0x36522d0>]

But we don’t see the plot yet.

In [5]: show()

M. Bellis April ’11 Data display 6 / 21

plot

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

Default is a blue line connecting points, butthere are other plotting options.

Close plot window, and try some of these:

In [6]: plot(x,y,’r--’)

In [6]: plot(x,y,’g-’)

In [6]: plot(x,y,’ks-’,linewidth=4)

In [6]: plot(x,y,’co’)

In [6]: axes().set xlim(-10,10) // Set the range on

the x-axis

Note that you do not have to show() aftereach one.

Note also that unless you close the window,these are overlaid on one another.

M. Bellis April ’11 Data display 7 / 21

plot

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

Default is a blue line connecting points, butthere are other plotting options.

Close plot window, and try some of these:

In [6]: plot(x,y,’r--’)

In [6]: plot(x,y,’g-’)

In [6]: plot(x,y,’ks-’,linewidth=4)

In [6]: plot(x,y,’co’)

In [6]: axes().set xlim(-10,10) // Set the range on

the x-axis

Note that you do not have to show() aftereach one.

Note also that unless you close the window,these are overlaid on one another.

M. Bellis April ’11 Data display 7 / 21

Your plots

The figure exists in global memory so it is easy to save it as a file.

In [9]: savefig(’myplot.png’)

Use https://afs.stanford.edu to download the image file to your desktop.Upload this image to the Google Doc for this lecture.

https://docs.google.com/present/edit?id=0AaEmDaJ8A2rAZGhwc3pudzhfOTM1Y21wcDk5bjk&hl=en

Let’s take a look at your plots!

M. Bellis April ’11 Data display 8 / 21

Your plots

The figure exists in global memory so it is easy to save it as a file.

In [9]: savefig(’myplot.png’)

Use https://afs.stanford.edu to download the image file to your desktop.Upload this image to the Google Doc for this lecture.

https://docs.google.com/present/edit?id=0AaEmDaJ8A2rAZGhwc3pudzhfOTM1Y21wcDk5bjk&hl=en

Let’s take a look at your plots!

M. Bellis April ’11 Data display 8 / 21

Your data

• If you are lucky, at some point in your life you will get to work on a problem forwhich the answer is not known.

• There’s a reason why people do Sudoku/crossword puzzles/word jumbles: the joy ofsolving a puzzle.

• Your research experience will hopefully help you learn how collect data and gleaninformation from it.

• You also learn how to present that data to others.

• Just as importantly you learn how to present that data to to yourself so that youcan make accurate and precise statements about what you are measuring.

• Never forget that the collection of experimental data is a means to an end...notsimply and end in itself.

M. Bellis April ’11 Data display 9 / 21

Your data

• If you are lucky, at some point in your life you will get to work on a problem forwhich the answer is not known.

• There’s a reason why people do Sudoku/crossword puzzles/word jumbles: the joy ofsolving a puzzle.

• Your research experience will hopefully help you learn how collect data and gleaninformation from it.

• You also learn how to present that data to others.

• Just as importantly you learn how to present that data to to yourself so that youcan make accurate and precise statements about what you are measuring.

• Never forget that the collection of experimental data is a means to an end...notsimply and end in itself.

M. Bellis April ’11 Data display 9 / 21

Your data

• If you are lucky, at some point in your life you will get to work on a problem forwhich the answer is not known.

• There’s a reason why people do Sudoku/crossword puzzles/word jumbles: the joy ofsolving a puzzle.

• Your research experience will hopefully help you learn how collect data and gleaninformation from it.

• You also learn how to present that data to others.

• Just as importantly you learn how to present that data to to yourself so that youcan make accurate and precise statements about what you are measuring.

• Never forget that the collection of experimental data is a means to an end...notsimply and end in itself.

M. Bellis April ’11 Data display 9 / 21

Tufte

• Parts of this lecture are motivated by the following.• Tufte, E., The Visual Display of Quantitative Information

• Can’t recommend this enough.• Examples of the good, the bad and the ugly in the world of charts, plots

and graphs.• Can find Prof. Cabrera’s old monopole data as an example of a well

constructed plot (p.39).

M. Bellis April ’11 Data display 10 / 21

Tufte

• Parts of this lecture are motivated by the following.• Tufte, E., The Visual Display of Quantitative Information

• Can’t recommend this enough.• Examples of the good, the bad and the ugly in the world of charts, plots

and graphs.• Can find Prof. Cabrera’s old monopole data as an example of a well

constructed plot (p.39).

M. Bellis April ’11 Data display 10 / 21

Tufte

Tufte lists his Principles of Graphical Excellence, (p.51)

• Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

• Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

• Graphical excellence is nearly always multivariate.

• Graphical excellence requires telling the truth about the data.

• Graphical excellence is that which gives to the viewer the greatest nunberof ideas in the shortest time with the least ink in the smallest space

M. Bellis April ’11 Data display 11 / 21

Tufte

Tufte lists his Principles of Graphical Excellence, (p.51)

• Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

• Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

• Graphical excellence is nearly always multivariate.

• Graphical excellence requires telling the truth about the data.

• Graphical excellence is that which gives to the viewer the greatest nunberof ideas in the shortest time with the least ink in the smallest space

M. Bellis April ’11 Data display 11 / 21

Tufte

Tufte lists his Principles of Graphical Excellence, (p.51)

• Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

• Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

• Graphical excellence is nearly always multivariate.

• Graphical excellence requires telling the truth about the data.

• Graphical excellence is that which gives to the viewer the greatest nunberof ideas in the shortest time with the least ink in the smallest space

M. Bellis April ’11 Data display 11 / 21

Tufte

Tufte lists his Principles of Graphical Excellence, (p.51)

• Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

• Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

• Graphical excellence is nearly always multivariate.

• Graphical excellence requires telling the truth about the data.

• Graphical excellence is that which gives to the viewer the greatest nunberof ideas in the shortest time with the least ink in the smallest space

M. Bellis April ’11 Data display 11 / 21

Tufte

Tufte lists his Principles of Graphical Excellence, (p.51)

• Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

• Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

• Graphical excellence is nearly always multivariate.

• Graphical excellence requires telling the truth about the data.

• Graphical excellence is that which gives to the viewer the greatest nunberof ideas in the shortest time with the least ink in the smallest space

M. Bellis April ’11 Data display 11 / 21

Tufte

Tufte lists his Principles of Graphical Excellence, (p.51)

• Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

• Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

• Graphical excellence is nearly always multivariate.

• Graphical excellence requires telling the truth about the data.

• Graphical excellence is that which gives to the viewer the greatest nunberof ideas in the shortest time with the least ink in the smallest space

M. Bellis April ’11 Data display 11 / 21

Binning

• Even the simplest assumptions about how to display your data can affect theconclusions you or others draw.

• Histogramming your data.

• Freedom to choose bin size.• Can affect fits to data.

• The following is the same 100 random events from a Gaussian distribution.

• µ = 0.0, σ = 1.0

• Plotted with 3 different bin sizes.

M. Bellis April ’11 Data display 12 / 21

Binning

• Even the simplest assumptions about how to display your data can affect theconclusions you or others draw.

• Histogramming your data.

• Freedom to choose bin size.• Can affect fits to data.

• The following is the same 100 random events from a Gaussian distribution.

• µ = 0.0, σ = 1.0

• Plotted with 3 different bin sizes.

M. Bellis April ’11 Data display 12 / 21

Binning

• Even the simplest assumptions about how to display your data can affect theconclusions you or others draw.

• Histogramming your data.

• Freedom to choose bin size.• Can affect fits to data.

• The following is the same 100 random events from a Gaussian distribution.

• µ = 0.0, σ = 1.0

• Plotted with 3 different bin sizes.

M. Bellis April ’11 Data display 12 / 21

Binning

10000 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

0.5

1

1.5

2

2.5

3

100 bins 10 bins

M. Bellis April ’11 Data display 13 / 21

Binning

10000 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

0.5

1

1.5

2

2.5

3

100 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

1

2

3

4

5

6

7

8

9

10 bins

M. Bellis April ’11 Data display 13 / 21

Binning

10000 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

0.5

1

1.5

2

2.5

3

100 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

1

2

3

4

5

6

7

8

9

10 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

10

20

30

40

50

M. Bellis April ’11 Data display 13 / 21

Binning

10000 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

0.5

1

1.5

2

2.5

3

Constant 0.104± 1.026 Mean 10.68848± -0.08034 Sigma 8.078± 9.954

100 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

1

2

3

4

5

6

7

8

9

Constant 0.466± 3.076 Mean 0.209248± -0.003924 Sigma 0.262± 1.341

10 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

10

20

30

40

50

Constant 4.99± 37.25 Mean 0.13808± -0.03304 Sigma 0.115± 1.081

• Using same fitting tool, wound up with 3 very different widths.

• Couldn’t even get a sense of the parent distribution from first plot.

• In general the binning should be motivated by the resolution of your measurements.

• If your detector/ruler/samples have resolution x, you don’t want to plot your datawith bins of x

10width.

• Don’t not think about your data!

• Even if it seems to be super trivial!

• Your data should tell a clear story.

M. Bellis April ’11 Data display 14 / 21

Binning

10000 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

0.5

1

1.5

2

2.5

3

Constant 0.104± 1.026 Mean 10.68848± -0.08034 Sigma 8.078± 9.954

100 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

1

2

3

4

5

6

7

8

9

Constant 0.466± 3.076 Mean 0.209248± -0.003924 Sigma 0.262± 1.341

10 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

10

20

30

40

50

Constant 4.99± 37.25 Mean 0.13808± -0.03304 Sigma 0.115± 1.081

• Using same fitting tool, wound up with 3 very different widths.

• Couldn’t even get a sense of the parent distribution from first plot.

• In general the binning should be motivated by the resolution of your measurements.

• If your detector/ruler/samples have resolution x, you don’t want to plot your datawith bins of x

10width.

• Don’t not think about your data!

• Even if it seems to be super trivial!

• Your data should tell a clear story.

M. Bellis April ’11 Data display 14 / 21

Binning

10000 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

0.5

1

1.5

2

2.5

3

Constant 0.104± 1.026 Mean 10.68848± -0.08034 Sigma 8.078± 9.954

100 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

1

2

3

4

5

6

7

8

9

Constant 0.466± 3.076 Mean 0.209248± -0.003924 Sigma 0.262± 1.341

10 bins

Abitrary measurement-4 -2 0 2 4

Num

ber

of c

ount

s

0

10

20

30

40

50

Constant 4.99± 37.25 Mean 0.13808± -0.03304 Sigma 0.115± 1.081

• Using same fitting tool, wound up with 3 very different widths.

• Couldn’t even get a sense of the parent distribution from first plot.

• In general the binning should be motivated by the resolution of your measurements.

• If your detector/ruler/samples have resolution x, you don’t want to plot your datawith bins of x

10width.

• Don’t not think about your data!

• Even if it seems to be super trivial!

• Your data should tell a clear story.

M. Bellis April ’11 Data display 14 / 21

Napolean’s march

Figure: Charles Joseph Minard’s display of Napolean’s excursion into Russia.

M. Bellis April ’11 Data display 15 / 21

Train schedule

Figure: 1880’s French train schedule.

M. Bellis April ’11 Data display 16 / 21

John Snow

Figure: John Snow’s map of cholera outbreak (1854).

M. Bellis April ’11 Data display 17 / 21

Cosmic Microwave Background

Figure: Comparison of COBE data with blackbody prediction.

M. Bellis April ’11 Data display 18 / 21

Ratio

Figure: Ratio of cross-sections for e+e− → hadrons to e+e− → µ+µ−, as afunction of center-of-mass energy

.

M. Bellis April ’11 Data display 19 / 21

Household debt

Figure: David Bein. Ration of household debt vs. US GDP.http://www.npr.org/blogs/money/2009/02/household_debt_vs_gdp.html

M. Bellis April ’11 Data display 20 / 21

Summary

• Collecting your data is not an end in inself.

• Your data tell a story. Visualizations help us see that story.

• Lots of good tools in Python and Matplotlib!

M. Bellis April ’11 Data display 21 / 21

top related