hearts and minds of data science - human-centered design … · how do we build tools that...
TRANSCRIPT
![Page 1: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/1.jpg)
The Hearts and Minds of Data Science
Cecilia R. Aragon Dept. of Human Centered Design & Engineering
eScience Institute University of Washington
Image by Bill Howe
![Page 2: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/2.jpg)
What is data science?
2
Impact
![Page 3: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/3.jpg)
Exponential improvements in technology and algorithms are enabling a revolution in discovery
• A proliferation of sensors • Ever more powerful models producing data that must be analyzed • The creation of almost all information in digital form • Dramatic cost reductions in storage • Dramatic increases in network bandwidth • Dramatic cost reductions and scalability improvements in
computation • Dramatic algorithmic breakthroughs in areas such as machine
learning
3
Impact
Adapted from: Ed Lazowska et al., eScience Institute, UW
![Page 4: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/4.jpg)
Nearly every field of discovery is transitioning from “data poor” to “data rich”
Astronomy: LSST Physics: LHC
Oceanography: OOI
Sociology: The Web
Biology: Sequencing Economics: POS
terminals
Neuroscience: EEG, fMRI
4
Impact
Ed Lazowska et al., eScience Institute, UW
![Page 5: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/5.jpg)
The Fourth Paradigm
5
1. Empirical + experimental 2. Theoretical 3. Computational 4. Data-Intensive
Impact
Adapted from: Ed Lazowska et al, eScience Institute, UW
![Page 6: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/6.jpg)
From data to knowledge to action
• The ability to extract knowledge from large, heterogeneous, noisy datasets – to move “from data to knowledge to action” – lies at the heart of 21st century discovery
• To remain at the forefront, researchers in all fields will need access to state-of-the-art data science methodologies and tools
• Data science is driven more by intellectual infrastructure (human skills) and software infrastructure (shared tools and services – digital capital) than by hardware
• These methodologies and tools will need to be designed to enable efficient discovery by the scientists who use them
6
Impact
Adapted from: Ed Lazowska et al., eScience Institute, UW
![Page 7: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/7.jpg)
Interdisciplinary Data Science Skills
• Programming Foundations and Data Abstractions • Software Engineering and Systems • Databases and Data Management • User-Centered Software Design; Design Foundations for
Interactive Systems • High-Performance Scientific Computing • Scalable and Data-Intensive Computing in the Cloud • Data Mining / Machine Learning / Statistics • Data and Information Visualization
UW eScience curriculum, http://escience.washington.edu/ education/proposed-escience-curriculum
![Page 8: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/8.jpg)
8
Moore/Sloan Data Science Environment: $37.8M Initiative at UCB, NYU, UW
Impact
Graphic by Ray Hong and eScience Institute, UW
![Page 9: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/9.jpg)
DSE Goal
• Change the culture of universities to create a data science culture
• “we're a startup in the Eric Ries sense, a ‘human institution designed to deliver a new product or service under conditions of extreme uncertainty.’ Our outcomes are new forms of doing science in academic settings, and we're going against the grain of large institutional priors.” – Fernando Pérez
Fernando Pérez http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html
![Page 10: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/10.jpg)
Six major thrusts
• Education and training • Software tools and environments • Career paths and alternative metrics • Reproducibility and open science • Working spaces and culture • Ethnography and evaluation
![Page 11: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/11.jpg)
What is ethnography and why use it?
• Qualitative field-based method of inquiry • Can focus on the dynamics and uses of data
and computation in context • Ecological validity • The determinants of culture are often invisible
or unspoken • Understanding why and not just what
![Page 12: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/12.jpg)
Human centered data science
![Page 13: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/13.jpg)
Two human issues
1. How do we build tools that facilitate data-driven science today?
2. How do we ensure that we have enough people with data science skills working in academia to produce the next generation of scientific discoveries?
![Page 14: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/14.jpg)
Two human issues
1. How do we build tools that facilitate data-driven science today?
2. How do we ensure that we have enough people with data science skills working in academia to produce the next generation of scientific discoveries?
![Page 15: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/15.jpg)
Why is this a human issue?
![Page 16: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/16.jpg)
What is the rate-limiting step in data understanding?
Time
Am
ount
of d
ata
in th
e w
orld
Amount of data in the world
800 EB
2009 1986
2.6 EB
The World’s Technological Capacity to Store, Communicate, and Compute Information, M. Hilbert and P. López (2011), Science, 332(6025), 60-65.
(1 EB = 1,000,000,000,000,000,000 B = 1018 bytes = one quintillion bytes = 100,000 Libraries of Congress)
![Page 17: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/17.jpg)
Time
Amou
nt o
f dat
a in
the
wor
ld
Time
Proc
essi
ng p
ower
What is the rate-limiting step in data understanding?
Processing power: Moore’s Law
Amount of data in the world
![Page 18: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/18.jpg)
Time
Amou
nt o
f dat
a in
the
wor
ld
Time
Proc
essi
ng p
ower
What is the rate-limiting step in data understanding?
Processing power: Moore’s Law
Effective Processing Power: Amdahl’s Law
Amount of data in the world
![Page 19: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/19.jpg)
Proc
essin
g pow
er
Time
What is the rate-limiting step in data understanding?
Processing power: Moore’s Law
Human cognitive capacity
Idea adapted from “Less is More” by Bill Buxton (2001)
Amount of data in the world
Effective Processing Power: Amdahl’s Law
![Page 20: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/20.jpg)
Proc
essin
g pow
er
Time
Processing power: Moore’s Law
Human cognitive capacity
Amount of data in the world
Bridging the Data Science Gaps
• Human centered data science – collaborative software, effective user interfaces – visualization – sociology, psychology,
study of socio-technical systems
• Hardware • Software • Parallel
computing • Machine
learning
![Page 21: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/21.jpg)
http://sciencewatch.com/articles/multiauthor-papers-onward-and-upward (C. King, Jul 2012)
Collaboration Growth: “Hyperauthorship”
Physics (3,221)
Medicine (2,459)
![Page 22: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/22.jpg)
Number of multi-author papers, 1998-2011
http://sciencewatch.com/articles/multiauthor-papers-onward-and-upward (C. King, Jul 2012)
![Page 23: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/23.jpg)
Collaboration is essential
• To understand vast and complex data sets • To combine different skills • To build interdisciplinary teams to solve the
most challenging scientific, social, political, and human problems
![Page 24: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/24.jpg)
But it’s not so easy to build collaboration
• Many barriers are social, not just technical – Cummings and Kiesler study (2007) of 491 scientific collaborations,
“Coordination costs and project outcomes in multi-university collaborations.” Research Policy, 36(10), 138-152.
– Charlotte Lee, “Barriers to Adoption of Collaboration Technologies,” presented at CHI 09 workshop
• too little is known about dynamics of complex work teams • collaboration across disciplines is difficult (different languages, methods) • distributed work is difficult (different organizational structures and processes) • need to study how to foster productive collaborations • the human infrastructure of cyberinfrastructure
![Page 25: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/25.jpg)
Data science involves solving both technical and social issues
![Page 26: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/26.jpg)
Astrophysics example
1. New machine learning approach to transient search
2. Image processing for supernova detection 3. Collaboration building 4. Cross-cultural issues
![Page 27: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/27.jpg)
Machine learning
"Supernova Recognition using Support Vector Machines," R. Romano, C. Aragon, and C. Ding, Proceedings of the 5th International Conference of Machine Learning Applications, Orlando, FL (2006). Winner of Best Application Paper Award.
![Page 28: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/28.jpg)
Impact on Supernova Search
![Page 29: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/29.jpg)
Imbalanced Data & Class Uncertainty • Ratio of positives to negatives less than 1/10,000 • Many negatives in region of overlap • Potentially mislabeled examples
![Page 30: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/30.jpg)
Image processing
"A Fast Contour Descriptor Algorithm for Supernova Image Classification," C. Aragon and D. Aragon, SPIE Symposium on Electronic Imaging: Real-Time Image Processing, San Jose, CA (2007).
![Page 31: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/31.jpg)
Shape Detection Scores • Many “bad subtractions” have shapes which distinguish them
from good supernova candidates. • Add “roundness” scores using Fourier contour descriptor
algorithm
![Page 32: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/32.jpg)
Two New Features Added • R1 – object contour eccentricity; ratio of powers in lowest order
negative and positive Fourier contour descriptors
• R2 – object contour irregularity; power in higher order Fourier contour descriptors divided by total power
∑
∑
≠
>=
0
21
2
2R
kk
kk
F
F
21
211R
FF−=
"A Fast Contour Descriptor Algorithm for Supernova Image Classification," C. Aragon and D. Aragon, SPIE Symposium on Electronic Imaging: Real-Time Image Processing, San Jose, CA (2007).
![Page 33: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/33.jpg)
Fast Algorithm
• Find good supernova candidates – R1 measures ||F-1|| vs. ||F1|| (elliptical vs. circular) – R2 measures power in higher terms vs. ||F1|| and ||F-1||
• Advantages of these R1 and R2: – Scale invariant (due to normalization by F1) – Rotation invariant (only magnitudes used, no phase) – Fast! Only two transform terms, and they are complex conjugates (F1, F-1), so
intermediate results are shared
• Performance depends on how fast we can obtain (for each point zi = xi +iyi on the contour) the primitives used to form F1 and F-1:
• In the end:
– 41% reduction in false positives, no measurable increase in running time
)cos(),sin(
),sin(),cos(22
22
Ni
iNi
i
Ni
iNi
i
yx
yxππ
ππ
"A Fast Contour Descriptor Algorithm for Supernova Image Classification," C. Aragon and D. Aragon, SPIE Symposium on Electronic Imaging: Real-Time Image Processing, San Jose, CA (2007).
![Page 34: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/34.jpg)
Collaboration building
“Context-Linked Virtual Assistants in Distributed Teams: An Astrophysics Case Study," S. Poon, R. Thomas, C. Aragon, B. Lee, CSCW 2008, San Diego, CA (2008). Best Paper Award Nominee.
![Page 35: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/35.jpg)
Lightweight, context-linked tools for collaborative work
• “Context-Linked Virtual Assistants in Distributed Teams: An Astrophysics Case Study," S. Poon, R. Thomas, C. Aragon, B. Lee, CSCW 2008, San Diego, CA (2008). Best Paper Award Nominee.
![Page 36: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/36.jpg)
36
Discussion and Benefits
• Context-linked information aids in communication and collaborative work – Provides continuous environmental data – Informs awareness of time, schedule – Crucial in coordinating time-critical tasks – Reduction of coordination efforts to establish
common ground (less time spent figuring out state of tasks, beneficial when working under time pressure)
• Lightweight, easy-to-use – Did not require elaborate setup
“Context-Linked Virtual Assistants in Distributed Teams: An Astrophysics Case Study," S. Poon, R. Thomas, C. Aragon, B. Lee, CSCW 2008, San Diego, CA (2008). Best Paper Award Nominee.
![Page 37: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/37.jpg)
Cross-cultural issues
![Page 38: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/38.jpg)
Improved cross-cultural communication via context-linked software tools
"No Sense of Distance: Improving Cross-Cultural Communication with Context-Linked Software Tools," Cecilia R. Aragon and Sarah S. Poon, iConference (2011).
![Page 39: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/39.jpg)
Measurable improvements from human centered approach
• Measurable improvements to transient search and data analysis in many areas. – Improved image processing algorithms:
• Fourier contour analysis (41% reduction in false positives)
– New machine learning algorithms for astronomical transient search:
• support vector machines • boosted decision trees (90% reduction in false positives)
– Improved control interfaces for scanning and vetting • approx. 70% speedup per image
• Search + Scanning: approx. 90% labor savings • from 6-8 people working ~4 hrs/day to 1 person
working 1 hr/day
![Page 40: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/40.jpg)
The sociotechnical ecosystem of science
![Page 41: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/41.jpg)
Tools that fit within the sociotechnical ecosystem of science
• IPython - Fernando Pérez – base for interactive scientific environment – enables
exploratory computing – browser-based notebooks (rich text, code, plots)
Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: http://ipython.org
![Page 42: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/42.jpg)
Tools that fit within the sociotechnical ecosystem of science
• mpld3, a toolkit for visualizing matplotlib graphics in-browser via d3 – Jake Vanderplas
http://jakevdp.github.io/blog/2014/01/10/d3-plugins-truly-interactive/
![Page 43: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/43.jpg)
Research with Social Media Text Data • Ranges along quantitative – qualitative spectrum • Quantitative
– Good: Large quantities of data, efficient – Bad: Relatively shallow, superficial
• Qualitative – Good: deep, explanatory conclusions – Bad: Small, focused samples, inefficient
• How to get best of both worlds? – Visual analytics, combining machine learning with
visualization and qualitative research
"Statistical Affect Detection in Collaborative Chat," Michael Brooks, Katie Kuksenok, Megan Torkildson, Daniel Perry, John Robinson, Paul Harris, Ona Anicello, Taylor Scott, Ariana Zukowski, Cecilia Aragon. CSCW '13 (2013)
![Page 44: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/44.jpg)
Integrate machine learning and interactive visualization into a qualitative research workflow
• Conclusions based on large data sets
• Maintain context, nuance, depth
• Select tools based on transparency and understandability, not just efficiency
"Statistical Affect Detection in Collaborative Chat," Michael Brooks, Katie Kuksenok, Megan Torkildson, Daniel Perry, John Robinson, Paul Harris, Ona Anicello, Taylor Scott, Ariana Zukowski, Cecilia Aragon. CSCW '13 (2013)
![Page 45: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/45.jpg)
Human issues in data science
• Ethics: Lilly Irani and Six Silberman, UCSD, Turkopticon and crowdsourcing Irani, Lilly C., and M. Silberman. Turkopticon: Interrupting worker invisibility in Amazon Mechanical Turk. Proc.of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2013, 611–620.
• Design: What happens when you crowd-source design? Daniela Rosner, UW (systems that guarantee banal results?)
Rosner, Daniela, and Jonathan Bean. "Big data, diminished design." interactions 21, no. 3. In press.
• Data expectations: What does data really mean across different fields? Gina Neff, B. Fiore-Silfvast
Brittany Fiore-Silfvast and Gina Neff, UW Dept of Communication, “Communication, Mediation, and the Expectations of Data: Data Valences across Health and Wellness Communities.” 2014. In press.
![Page 46: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/46.jpg)
Human centered data science
![Page 47: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/47.jpg)
Two human issues
1. How do we build tools that facilitate data-driven science today?
2. How do we ensure that we have enough people with data science skills working in academia to produce the next generation of scientific discoveries?
![Page 48: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/48.jpg)
Interdisciplinary Data Science Skills
• Programming Foundations and Data Abstractions • Software Engineering and Systems • Databases and Data Management • User-Centered Software Design; Design Foundations for
Interactive Systems • High-Performance Scientific Computing • Scalable and Data-Intensive Computing in the Cloud • Data Mining / Machine Learning / Statistics • Data and Information Visualization
eScience curriculum, http://escience.washington.edu/ education/proposed-escience-curriculum
![Page 49: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/49.jpg)
Human issues of data science in academia – Fernando Perez
• Academia favors individualism, hyper-specialization and novelty
• Collaborations and interdisciplinary work are less rewarded • Writing software to enable new science is not rewarded
(it’s time not spent writing papers) • Scientists have been trained to think of computation as not
“real science” • Methodologists (e.g. computer scientists, statisticians) are
not rewarded for creating applications • The skills of data science are in tremendous demand in
industry today = “big data brain drain” (see Jake Vanderplas blog)
http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html
![Page 50: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/50.jpg)
The Big Data Brain Drain
• “the skills required to be a successful scientific researcher are increasingly indistinguishable from the skills required to be successful in industry.”
• “open, well-documented, and well-tested scientific code is essential not only to reproducibility in modern scientific research, but to the very progression of research itself.”
–Jake Vanderplas
http://jakevdp.github.io/blog/2013/10/26/big-data-brain-drain/
![Page 51: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/51.jpg)
Human issues of data science in academia
• So what are we going to do about this? • We know that data science skills are necessary
for scientific discovery • Who is going to teach these skills? • Academic scientific disciplines have full
schedules, tend not to teach computation and data skills or only via a course or two.
• And when people acquire data science skills, there is no job path in academia
![Page 52: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/52.jpg)
Data science as an academic department
• Analogy: how the discipline of computer science was created
• No undergraduate CS degree at Caltech in the 1980s
![Page 53: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/53.jpg)
The emerging discipline of computer science
• “Only failed mathematicians go into computer science.” – Unnamed mathematician, Caltech, 1980s
• “I wrote a program once. It wasn’t that hard.” – Unnamed physicist, Berkeley, 1990s
• “But… this programming… what if it’s all just a fad?” – Unnamed physics professor, U Penn, 2000s
![Page 54: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/54.jpg)
Interdisciplinary departments
• Another exemplar: our own interdisciplinary department – Human Centered Design and Engineering – Tenure case does not involve only publishing in
the “right” journals – Part of the case is that you present evidence of
your impact (conferences, journals, altmetrics) – New department – But: fastest-growing department in the College of
Engineering at UW
![Page 55: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/55.jpg)
The hearts and minds of data science
![Page 56: Hearts and Minds of Data Science - Human-centered design … · How do we build tools that facilitate data-driven science today? 2. ... – base for interactive scientific environment](https://reader033.vdocument.in/reader033/viewer/2022050205/5f584fde76ae47007438511c/html5/thumbnails/56.jpg)
Questions?
Cecilia R. Aragon Human Centered Design & Engineering
University of Washington [email protected]