designing curriculum for a.i. costas j. spanos for david...

30
Designing Curriculum for A.I. Costas J. Spanos for David Culler Costas J. Spanos Andrew S. Grove Distinguished Professor, EECS, UC Berkeley Director, CITRIS & the Banatao Institute CTO, Berkeley Education Alliance for Research in Singapore 5/29/2017 5/29/17 NTU 1

Upload: others

Post on 29-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Designing Curriculum for A.I.

    Costas J. Spanosfor David Culler

    Costas J. Spanos

    Andrew S. Grove Distinguished Professor, EECS, UC Berkeley

    Director, CITRIS & the Banatao Institute

    CTO, Berkeley Education Alliance for Research in Singapore

    5/29/2017

    5/29/17 NTU 1

  • Deep Learning

    5/29/17 NTU 2

  • Inferential Thinkingor, estimating from incomplete data

    • Generalizations (predictions, parameter

    estimates, and conclusions) that go beyond

    describing the given data;

    • Use of data as evidence for those generalizations;

    • Conclusions that express a degree of uncertainty,

    accounting for the variability or uncertainty

    inherent in generalizing beyond the immediate

    data to a population or a process.

    5/29/17 NTU 3

  • Sampling and Estimation

    Sampling: the act of making inferences aboutpopulations.

    Random sampling: when each observation isidentically and independently distributed.

    Statistic: a function of sample data containing nounknowns. (e.g. average, median, standarddeviation, etc.)

    A statistic is a random variable. Its distribution is asampling distribution.

    5/29/17 NTU 4

  • InferenceBasic Statistics

    • Mean from data

    • Variance from data

    • Assume distribution is

    normal

    • Worry about estimation

    precision and sample size

    implications.

    Real Life Generalization

    5/29/17 NTU 5

  • Data and more Data…

    6/1/17 Inclusive

    Innovation

    6

  • The AI (R)evolution: from expert to data driven

    5/29/17 NTU 7

  • Data Science at the Heart of a

    21st Century University

    David Culler

    University of California, Berkeley

    http://data.berkeley.edu

    http://data.berkeley.edu

  • The vision …

    In the 21st century every college

    student should be prepared to

    understand and develop points of view

    based on the analysis of data as well

    as evaluate arguments made by others

    12/12/16 NAS UCB

    DSed

    9

  • Vectors of ChangeNearly every field of discovery is transitioning from “data poor” to “data rich”

    Astronomy:LSST

    Physics:LHCOceanography:OOI

    Sociology:TheWeb

    Biology:SequencingEconomics:POS

    terminals

    Neuroscience:EEG,fMRI

    3

    Data Science throughout campus

    Feb15,2013

    AMPLabIonStoica,CSMichaelFranklin,CSMateiZaharia,CS

    AdamArkin,Bioengineering

    EmmanuelSaez,Economics

    Reconstruc ngthemoviesinyourmind

    BinYu,Sta s csJackGallant,Neuroscience

    Earthqu

    ake

    Strong

    Shaking

    in

    11second

    s RichardAllenEarth&Plan.Science

    FernandoPerez,BrainImagingCenter

    CharlesMarshallRosieGillespieIntegra veBiology

  • 11

    A National Challenge

    DS@UCB

    In the United States, it is reported that by 2018 there will be more than 490,000 data science positions available, but only 200,000 qualified people to fill the roles. The average size of a graduate class of data science students is 23 students. With approximately only 110 universities offering data science studies, the growing market will continue to pressure the supply in the US.

    11

  • 12

    Foundations of Data Science @ UCB

    DS@UCB 12

  • 14

    Data Science Foundation (CS+Stat+Critical Thinking w/ data)

    Today’s Majors

    Data Science Core

    DS MajorDS Minor

    Rolling out to the studentsData Science in the Undergrad Experience

    NAS UCB DSed 14

    Undergrad Class

    Existing course

    Thesis, Research

    DS in field

    DS focus

    ConnectorsSoc Biz CSPhys

    Concentrations

    Student Tracks

    http://data.berkeley.edu/sites/default/files/DataScienceCurriculumSketch.pdf

    http://data.berkeley.edu/sites/default/files/DataScienceCurriculumSketch.pdfhttp://data.berkeley.edu/sites/default/files/DataScienceCurriculumSketch.pdf

  • Enrollments in Berkeley’s Data Science Classes (1st 4 semesters)

    0

    200

    400

    600

    800

    1000

    1200

    1400

    Fall 2015 Spring 2016 Fall 2016 Spring 2017

    Advanced DataScience Classes(3 new in Sp 17)

    Connector Courses(21 so far)

    Foundations ofData Science(Data 8)

    15

  • Foundations of Data Science

    Online syllabus, labs, homeworks, videos – all at data8.orgOnline textbook – inferentialthinking.com

    16

    http://data8.org/http://inferentialthinking.com/

  • Foundations of Data Science - “Data 8”• All students should be able to

    – reason sensibly based on data

    – make and interpret inferences based on

    data

    – think critically about implications of data

    – regardless of their specialization

    • Designed for 1st and 2nd-year students

    of any major– 4 units, MWF lecture + 2-hour W-F lab

    – No CS or stats experience required

    • ~700 students in Sp 17

    • about 70 majors represented– Tied for first – L&S CS and Economics

    – Every college, ~ 1 dozen IEOR & ORMS

    • Online syllabus, labs, homeworks,

    videos – all at data8.org

    • Online textbook – inferentialthinking.com

    17

    http://data8.org/sp17/http://inferentialthinking.com/

  • Data and

    Ethics

    Making

    Sense of

    Cultural Data

    Children in

    the

    Developing

    World

    Social

    Networks

    Data

    Science and

    the Mind

    Genomics

    and Data

    Science

    Data

    Science

    For Smart

    Cities

    Data Science

    for Cognitive

    Neuroscience

    Comp.

    Structures in

    Data

    Science

    Probability

    and Math

    Stats in

    Data

    Science

    Data

    Science,

    Demography

    &

    Immigration

    Data Science Connectors

    Social Data

    Revolution

  • 19

    Data8 - Concepts and Computing• Fundamental co-mingling of CS & Stat concepts on real data

    – Learn computing concepts by doing interesting things on data

    – Learn advanced statistical concepts by observing what’s interesting

    – Codify understanding of concepts symbolically

    • “Explorations” in visualization, privacy, personalized medicine,…

    • Entirely cloud-based computing environment built on Jupyter notebooks

    plus UCB datascience Tables

    NAS UCB DSed 19

    http://data8.org

    http://data8.org

  • Tables (https://github.com/dsten/datascience)

    • A single, simple, powerful data structure for all

    • Inspired by Excel, SQL, R, Pandas, Numpy, …NAS UCB

    DSed

    20

    ordered collection of labeled columns of anything

    label

    values

    Numpy arrayT[‘label’]

    dict, record,tuple

    select, where, take, drop, group

    join

    stats, binsamplepivot, pivot_bin

    split

    12/12/16

    https://github.com/dsten/datascience

  • 21

    Explorations

    BEARS DS @ UCB 21

  • Connectors offered in 1st 4 semsCivEng 88 Data Science for Smart Cities Stat 88 Probability and Mathematical

    Statistics in DS

    CivEng 88B Time Series Analysis Stat 89A Introduction to Matrices and

    Graphs in DS

    Cog Sci 88 Data Science and the Mind L&S 39 Race, Policing, and Data Science

    CS 88 Computational Structures in

    Data Science

    L&S 88 Health, Human Behavior, and Data

    ESPM 88A Geospatial Data Explorations L&S 88 Child Development Around the

    World

    ESPM 88B Data Sciences in Ecology and

    the Environment

    L&S 88 Literature and Data

    Geog 88 Data Science Applications in

    Geography

    L&S 88 Genomics and Data Science

    Hist 88 How Does History Count? L&S 88 Social Networks

    Info 88A Data and Ethics L&S 88 Data Science for Cognitive

    Neuroscience

    Legal St 88 Crime and Punishment L&S 88 Data Science, Demography, and

    Immigration

    MCB 88 Immunotherapy of Cancer22

  • Data 8 Demographics, Fall 2016• 517 students from 59 majors

    23

  • Online syllabus and more – ds100.org

    NEW: The upper-division gateway, DS 100

    24

    http://ds100.org/

  • Big Concepts in DS 100

    • Data preparation and representation

    • Efficient and scalable data processing

    • Question formulation and experimental design

    • Exploratory data analysis and visualization

    • Modeling fitting and inference

    • Machine learning techniques and overfitting

    • Validation and hypothesis testing

    25

  • http://www.ds100.org/sp17/syllabus

    Week 1

    Week 2

    Week 3

    Week 4

    Week 5

    Week 6

    Week 7

    Week 8

    Week 9

    Week 10

    Week 11

    Week 12

    Week 13

    Week 14

    Week 15

    Week 16

    Review & Midterm

    SpringBreak

    RRR

    Big Concepts in Data Science

    Working with Real Data and SQL

    Basic StatisticalModeling

    Inference, Prediction, &

    Machine Learning

    ML Techniques

    ML Techniques

    Analytics at Scale

    HW1

    HW2

    HW3

    HW4

    HW4

    HW5

    HW6

    HW6

    HW6

    “pilot” DS 100 syllabus

    26

    http://www.ds100.org/sp17/syllabus

  • Foundations of Data Science

    Possible DS Major Structure

    12/12/16

    Mathematics College Breadth Electives

    Principles and Techniques of Data Science

    Statistics Depth

    External Emphasis

    Statistical Machine Learning Implications

    Computing

    CS Depth

    NAS UCB

    DSed

    27

  • Biological Sciences

    Math & Physical Sciences

    Arts & Humanities

    Social Sciences

    College of Letters & Science

    Computational &

    Data Sciences & Eng.

    Engineering Professional Schools

    Institutional Refactoring

  • NEW: Probability for Data Science

    29

  • NEW: Statistical Methods for Data Science

    30