20150522_example_pydata_use-cases_in_astronomy_research

15
A research group’s use- cases for PyData tools Samuel Harrold Astrophysics PhD Student, UT Austin 2015-05-22 @ Continuum Analytics, Austin, TX

Upload: samuel-harrold

Post on 11-Aug-2015

46 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: 20150522_Example_PyData_use-cases_in_astronomy_research

A research group’s use-cases for PyData tools

Samuel HarroldAstrophysics PhD Student, UT Austin

2015-05-22@ Continuum Analytics, Austin, TX

Page 2: 20150522_Example_PyData_use-cases_in_astronomy_research

Motivation

● In 2011:○ Research group mostly used

bash scripts, awk, Fortran, IDL, IRAF.○ Pipeline was tightly coupled with old

computers, cameras, camera software.

Page 3: 20150522_Example_PyData_use-cases_in_astronomy_research

Motivation

● In 2011:○ Research group mostly used

bash scripts, awk, Fortran, IDL, IRAF.○ Pipeline was tightly coupled with old

computers, cameras, camera software.● Goals for new computers and camera:

○ Make pipeline loosely coupled, cross-platform.○ Develop skills for non-academic job market.

Page 4: 20150522_Example_PyData_use-cases_in_astronomy_research

Motivation

● In 2011:○ Research group mostly used

bash scripts, awk, Fortran, IDL, IRAF.○ Pipeline was tightly coupled with old

computers, cameras, camera software.● Goals for new computers and camera:

○ Make pipeline loosely coupled, cross-platform.○ Develop skills for non-academic job market.

● Led research group in adopting Python tools.

Page 5: 20150522_Example_PyData_use-cases_in_astronomy_research

● Conflict of interest:Engineering vs publishing papers.

● To adopt best practices from industry, science needs more tools that lower the entry barrier.○ Example: It’s hard to mine your data if you don’t

know how to create a database.

Summary

Page 6: 20150522_Example_PyData_use-cases_in_astronomy_research

Outline

● Motivation

● Use-cases

● FAQ from researchers

Page 7: 20150522_Example_PyData_use-cases_in_astronomy_research

Use of some PyData tools

● Anaconda: Environment management.● IPython Notebooks: Copy-paste code share.● scikit-image: Detecting stars.● pandas: Data organization.● statsmodels, emcee: Robust statistics.● astropy, astroML: Astronomy-specific.

Page 8: 20150522_Example_PyData_use-cases_in_astronomy_research

Use-case: Star brightness vs time

● “Time-series photometry.”● Objective:

○ Extract relative brightness of stars from images during acquisition.

https://github.com/ccd-utexas/tsphot

Page 9: 20150522_Example_PyData_use-cases_in_astronomy_research

Use-case: Star brightness vs time

● Status:○ Developed to be good enough for internal use, but

not made robust for distribution.○ Conflict of interest: engineering vs publishing papers

https://github.com/ccd-utexas/tsphot

Page 10: 20150522_Example_PyData_use-cases_in_astronomy_research

Use-case: Data mining platform

● Objective:○ Predict which unobserved white dwarf stars pulsate.

■ What stars are there? From catalogs.■ Which stars are published (non)pulsators? From papers.■ Which stars are unpublished (non)pulsators? From our data.

http://www.slideshare.net/SamuelHarrold/20140409-harrold-dataminingdemostellarseminar

Page 11: 20150522_Example_PyData_use-cases_in_astronomy_research

Use-case: Data mining platform

● Status:○ Shut down due to under-use.

■ Users preferred grep + Excel rather than pandas.■ Users didn’t want to maintain MySQL database.

○ Conflict of interest: engineering vs publishing papers

http://www.slideshare.net/SamuelHarrold/20140409-harrold-dataminingdemostellarseminar

Page 12: 20150522_Example_PyData_use-cases_in_astronomy_research

Use-case: Reproducible research

● Objective:○ Compute the physical quantities of a binary star

system from time-series photometry.

https://github.com/stharrold/Harrold_2015_SDSSJ1600; https://pypi.python.org/pypi/binstarsolver

Page 13: 20150522_Example_PyData_use-cases_in_astronomy_research

Use-case: Reproducible research

● Status:○ Citable code on GitHub with DOI from zenodo.org.○ Distributable code published to PyPI.○ Conflict of interest: engineering vs publishing papers

https://github.com/stharrold/Harrold_2015_SDSSJ1600; https://pypi.python.org/pypi/binstarsolver

Page 14: 20150522_Example_PyData_use-cases_in_astronomy_research

FAQ from researchers● Questions:

○ “Why don’t you use ___?”○ “How does this help publish more papers?”○ “Why should I learn another language?”

Page 15: 20150522_Example_PyData_use-cases_in_astronomy_research

FAQ from researchers● Questions:

○ “Why don’t you use ___?”○ “How does this help publish more papers?”○ “Why should I learn another language?”

● Answers:○ “Look how quickly I can do ___.”○ Examples justify taking time to learn new skills.○ NSF Data Management and Sharing requirements:

https://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp○ TIOBE code popularity index:

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html○ Jake VanderPlas’s blog post on data science and academia:

https://jakevdp.github.io/blog/2014/08/22/hacking-academia/