designing curriculum for a.i. costas j. spanos for david...
TRANSCRIPT
-
Designing Curriculum for A.I.
Costas J. Spanosfor David Culler
Costas J. Spanos
Andrew S. Grove Distinguished Professor, EECS, UC Berkeley
Director, CITRIS & the Banatao Institute
CTO, Berkeley Education Alliance for Research in Singapore
5/29/2017
5/29/17 NTU 1
-
Deep Learning
5/29/17 NTU 2
-
Inferential Thinkingor, estimating from incomplete data
• Generalizations (predictions, parameter
estimates, and conclusions) that go beyond
describing the given data;
• Use of data as evidence for those generalizations;
• Conclusions that express a degree of uncertainty,
accounting for the variability or uncertainty
inherent in generalizing beyond the immediate
data to a population or a process.
5/29/17 NTU 3
-
Sampling and Estimation
Sampling: the act of making inferences aboutpopulations.
Random sampling: when each observation isidentically and independently distributed.
Statistic: a function of sample data containing nounknowns. (e.g. average, median, standarddeviation, etc.)
A statistic is a random variable. Its distribution is asampling distribution.
5/29/17 NTU 4
-
InferenceBasic Statistics
• Mean from data
• Variance from data
• Assume distribution is
normal
• Worry about estimation
precision and sample size
implications.
Real Life Generalization
5/29/17 NTU 5
-
Data and more Data…
6/1/17 Inclusive
Innovation
6
-
The AI (R)evolution: from expert to data driven
5/29/17 NTU 7
-
Data Science at the Heart of a
21st Century University
David Culler
University of California, Berkeley
http://data.berkeley.edu
http://data.berkeley.edu
-
The vision …
In the 21st century every college
student should be prepared to
understand and develop points of view
based on the analysis of data as well
as evaluate arguments made by others
12/12/16 NAS UCB
DSed
9
-
Vectors of ChangeNearly every field of discovery is transitioning from “data poor” to “data rich”
Astronomy:LSST
Physics:LHCOceanography:OOI
Sociology:TheWeb
Biology:SequencingEconomics:POS
terminals
Neuroscience:EEG,fMRI
3
Data Science throughout campus
Feb15,2013
AMPLabIonStoica,CSMichaelFranklin,CSMateiZaharia,CS
AdamArkin,Bioengineering
EmmanuelSaez,Economics
Reconstruc ngthemoviesinyourmind
BinYu,Sta s csJackGallant,Neuroscience
Earthqu
ake
Strong
Shaking
in
11second
s RichardAllenEarth&Plan.Science
FernandoPerez,BrainImagingCenter
CharlesMarshallRosieGillespieIntegra veBiology
-
11
A National Challenge
DS@UCB
In the United States, it is reported that by 2018 there will be more than 490,000 data science positions available, but only 200,000 qualified people to fill the roles. The average size of a graduate class of data science students is 23 students. With approximately only 110 universities offering data science studies, the growing market will continue to pressure the supply in the US.
11
-
12
Foundations of Data Science @ UCB
DS@UCB 12
-
14
Data Science Foundation (CS+Stat+Critical Thinking w/ data)
Today’s Majors
Data Science Core
DS MajorDS Minor
Rolling out to the studentsData Science in the Undergrad Experience
NAS UCB DSed 14
Undergrad Class
Existing course
Thesis, Research
DS in field
DS focus
ConnectorsSoc Biz CSPhys
Concentrations
Student Tracks
http://data.berkeley.edu/sites/default/files/DataScienceCurriculumSketch.pdf
http://data.berkeley.edu/sites/default/files/DataScienceCurriculumSketch.pdfhttp://data.berkeley.edu/sites/default/files/DataScienceCurriculumSketch.pdf
-
Enrollments in Berkeley’s Data Science Classes (1st 4 semesters)
0
200
400
600
800
1000
1200
1400
Fall 2015 Spring 2016 Fall 2016 Spring 2017
Advanced DataScience Classes(3 new in Sp 17)
Connector Courses(21 so far)
Foundations ofData Science(Data 8)
15
-
Foundations of Data Science
Online syllabus, labs, homeworks, videos – all at data8.orgOnline textbook – inferentialthinking.com
16
http://data8.org/http://inferentialthinking.com/
-
Foundations of Data Science - “Data 8”• All students should be able to
– reason sensibly based on data
– make and interpret inferences based on
data
– think critically about implications of data
– regardless of their specialization
• Designed for 1st and 2nd-year students
of any major– 4 units, MWF lecture + 2-hour W-F lab
– No CS or stats experience required
• ~700 students in Sp 17
• about 70 majors represented– Tied for first – L&S CS and Economics
– Every college, ~ 1 dozen IEOR & ORMS
• Online syllabus, labs, homeworks,
videos – all at data8.org
• Online textbook – inferentialthinking.com
17
http://data8.org/sp17/http://inferentialthinking.com/
-
Data and
Ethics
Making
Sense of
Cultural Data
Children in
the
Developing
World
Social
Networks
Data
Science and
the Mind
Genomics
and Data
Science
Data
Science
For Smart
Cities
Data Science
for Cognitive
Neuroscience
Comp.
Structures in
Data
Science
Probability
and Math
Stats in
Data
Science
Data
Science,
Demography
&
Immigration
Data Science Connectors
Social Data
Revolution
-
19
Data8 - Concepts and Computing• Fundamental co-mingling of CS & Stat concepts on real data
– Learn computing concepts by doing interesting things on data
– Learn advanced statistical concepts by observing what’s interesting
– Codify understanding of concepts symbolically
• “Explorations” in visualization, privacy, personalized medicine,…
• Entirely cloud-based computing environment built on Jupyter notebooks
plus UCB datascience Tables
NAS UCB DSed 19
http://data8.org
http://data8.org
-
Tables (https://github.com/dsten/datascience)
• A single, simple, powerful data structure for all
• Inspired by Excel, SQL, R, Pandas, Numpy, …NAS UCB
DSed
20
ordered collection of labeled columns of anything
label
values
Numpy arrayT[‘label’]
dict, record,tuple
select, where, take, drop, group
join
stats, binsamplepivot, pivot_bin
split
12/12/16
https://github.com/dsten/datascience
-
21
Explorations
BEARS DS @ UCB 21
-
Connectors offered in 1st 4 semsCivEng 88 Data Science for Smart Cities Stat 88 Probability and Mathematical
Statistics in DS
CivEng 88B Time Series Analysis Stat 89A Introduction to Matrices and
Graphs in DS
Cog Sci 88 Data Science and the Mind L&S 39 Race, Policing, and Data Science
CS 88 Computational Structures in
Data Science
L&S 88 Health, Human Behavior, and Data
ESPM 88A Geospatial Data Explorations L&S 88 Child Development Around the
World
ESPM 88B Data Sciences in Ecology and
the Environment
L&S 88 Literature and Data
Geog 88 Data Science Applications in
Geography
L&S 88 Genomics and Data Science
Hist 88 How Does History Count? L&S 88 Social Networks
Info 88A Data and Ethics L&S 88 Data Science for Cognitive
Neuroscience
Legal St 88 Crime and Punishment L&S 88 Data Science, Demography, and
Immigration
MCB 88 Immunotherapy of Cancer22
-
Data 8 Demographics, Fall 2016• 517 students from 59 majors
23
-
Online syllabus and more – ds100.org
NEW: The upper-division gateway, DS 100
24
http://ds100.org/
-
Big Concepts in DS 100
• Data preparation and representation
• Efficient and scalable data processing
• Question formulation and experimental design
• Exploratory data analysis and visualization
• Modeling fitting and inference
• Machine learning techniques and overfitting
• Validation and hypothesis testing
25
-
http://www.ds100.org/sp17/syllabus
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12
Week 13
Week 14
Week 15
Week 16
Review & Midterm
SpringBreak
RRR
Big Concepts in Data Science
Working with Real Data and SQL
Basic StatisticalModeling
Inference, Prediction, &
Machine Learning
ML Techniques
ML Techniques
Analytics at Scale
HW1
HW2
HW3
HW4
HW4
HW5
HW6
HW6
HW6
“pilot” DS 100 syllabus
26
http://www.ds100.org/sp17/syllabus
-
Foundations of Data Science
Possible DS Major Structure
12/12/16
Mathematics College Breadth Electives
Principles and Techniques of Data Science
Statistics Depth
External Emphasis
Statistical Machine Learning Implications
Computing
CS Depth
NAS UCB
DSed
27
-
Biological Sciences
Math & Physical Sciences
Arts & Humanities
Social Sciences
College of Letters & Science
Computational &
Data Sciences & Eng.
Engineering Professional Schools
Institutional Refactoring
-
NEW: Probability for Data Science
29
-
NEW: Statistical Methods for Data Science
30