lecture1 – introduction and organization

20
8/29/2006 1 Lecture1 – Introduction and Organization Rice ELEC 697 Farinaz Koushanfar Fall 2006

Upload: corina

Post on 23-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Lecture1 – Introduction and Organization. Rice ELEC 697 Farinaz Koushanfar Fall 2006. Summary. Syllabus Course outline Motivation Class census. Syllabus – ELEC 697. Title: “Applications of Modern Statistical Learning Theory in Embedded Networked Systems” Instructor - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture1 – Introduction and Organization

8/29/2006 1

Lecture1 – Introduction and Organization

Rice ELEC 697

Farinaz Koushanfar

Fall 2006

Page 2: Lecture1 – Introduction and Organization

8/29/2006 2

Summary

• Syllabus

• Course outline

• Motivation

• Class census

Page 3: Lecture1 – Introduction and Organization

8/29/2006 3

Syllabus – ELEC 697

• Title: “Applications of Modern Statistical Learning Theory in Embedded Networked Systems”

• Instructor – Farinaz Koushanfar, Rice University

• Meeting time – 02:30 PM - 03:50 PM TR 

• Meeting place – 2014, Duncan Hall

• Prerequisites– Self-contained, but assuming undergraduate level knowledge of

probability and math

Page 4: Lecture1 – Introduction and Organization

8/29/2006 4

Syllabus - Overview and Goals

• Overview– Practical statistical learning methods and tools – Modeling and optimizing emerging embedded systems– Research areas: embedded networked systems, sensor networks, your

research area, assuming you will need the methods there– Emphasizing the methods rather than the theoretical aspects

• Goals – Solid understanding of the state-of-the-art learning methods– Hands-on experience with statistical modeling SW – Applications of statistical modeling in SN, Internet, Networks,

Intrusion detection, CAD, VLSI– A universal tool for your own research

Page 5: Lecture1 – Introduction and Organization

8/29/2006 5

Syllabus – Book and More…

• Textbook– The elements of statistical learning: data mining, inference,

and prediction, T. Hastie; R. Tibshirani; J. Friedman; New York : Springer, 2001.

• Recommended further reading– Pattern Classification (2nd ed.), by R. Duda

; P. Hart; D. Stork; Wiley Interscience, 2001. – Modern Applied Statistics with S-PLUS

, Third Edition, W. Venables; B. Ripley; Springer, 1999.– Papers from the literature

• Course webpage– http://www.ece.rice.edu/~fk1/classes/ELEC697.htm

Page 6: Lecture1 – Introduction and Organization

8/29/2006 6

Syllabus – Grading and Project

• Grading– Weekly assignments (20%)– Mid-semester oral presentation (15%)– Paper presentation and discussion (15%)– Class project report (30%) – Class project presentation (20%)

• Project– Groups of 1 or 2 (collaborations encouraged)– Dataset to analyze and model, can be more theoretical – Either propose or select from my projects/datasets

Page 7: Lecture1 – Introduction and Organization

8/29/2006 7

Syllabus - Software

• Hands-on experience with data analysis and modeling tool

• S programming language (Splus/R)

• You can download R from CRAN at: http://cran.us.r-project.org/

• Documentation is also available at CRAN

• Many more resources available on the web

Page 8: Lecture1 – Introduction and Organization

8/29/2006 8

Course Outline• Week 1: Orientation and overview of supervised learning and its

applications in embedded networks • Week 2: Intro to R, Linear regression, model selection, validation • Week 3: Applications of regression in embedded networks (HW 0)• Week 4: Linear classification: LDA, logistic, separating hyperplanes• Week 5: Applications of classifications in embedded networks (HW 1)• Week 6: Available datasets, possible project proposals, and project

selection• Week 7: Model assessment and selection • Week 8: Applications of models selection and validation in embedded

networked systems (HW 2)

Page 9: Lecture1 – Introduction and Organization

8/29/2006 9

Course Outline (Cont’d)• Week 9: Kernel methods• Week 10: Applications of kernel methods in embedded networked systems

(HW 3)• Week 11: Mid-term project proposal and presentations • Week 12: Model inference and averaging: boosting, ML, EM• Week 13: Applications of model inference in embedded networked

systems (HW4)• Week 14: Progress report -- presenting the related work to your project and

your goals • Week 15: Summary• Week 16: Final project presentation and reports (Report)

+ Paper presentations!

Page 10: Lecture1 – Introduction and Organization

8/29/2006 10

Class Consensus

• Tell me about yourself!• Your name• Your year of study• Your field – or your interest• Your advisor

Page 11: Lecture1 – Introduction and Organization

8/29/2006 11

Statistical Learning - General

Key role in science, finance, and industry. Examples:• Predict the prob. of a second hearth attack

(demographic, diet, clinical measures)• Stock prices in 6 months (company performance and

economic data)• Estimate no.’s in a handwritten zip-code• Estimate the glucose in diabetic patient blood

(infrared absorption spectrum)• Identify the risk factors in a prostate cancer (clinical

and demographic variables)

Page 12: Lecture1 – Introduction and Organization

8/29/2006 12

Sensor Networks (SN)

Contaminant Transport

Courtesy of Prof. Deborah Estrin (UCLA-CENS)

Environmental SensingSeismic Response

xb

ow

MIC

A2

D

OT

mo

tes

Page 13: Lecture1 – Introduction and Organization

8/29/2006 13

Statistical Learning - SN

• Classification/target detection• Modeling the biological systems• Inter-sensor modeling

– Sleeping coordination, compression, intrusion detection/security

• Characterization of sensors - a rapidly growing market, e.g.– Pressure sensors – revenue: $4,018.8M in 2004, projected $5,545.1M

in 2011– Image sensors - $4B++ in 2005, led by the camera phone application– Fiber-optic sensors - $288.1M now, will be $304.3M in 2006– Bio-sensors - ??– Proximity, Photoelectric, Linear Displacement Sensors - $1B in 2004,

will be 1.05B in 2007 – Nano-sensors – will grow more than 30%+ by 2009

Sensors & Transducers Magazine (S&T e-Digest), Vol.62, Issue 12, December 2005, pp.456-461

Page 14: Lecture1 – Introduction and Organization

8/29/2006 14

Statistical Learning – VLSI/CAD

• nanometer-scale devices: increased process variation and decreased predictability of circuit performance

• Traditionally corner-case models were used – pessimistic• The magnitude of variations in the gate length, are predicted to

increase from 35% in a 130nm technology to ~60% in a 70nm• The variations are specified the fraction 3/ • The major trade-off is the computational efficiency

King, Wada, Woo, IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING,VOL. 17, NO. 2, MAY 2004

Photoresist line pattern PDF

Page 15: Lecture1 – Introduction and Organization

8/29/2006 15

Sources of Variations

• Process variations– The value of process parameters observed after fabrication– Parametric yield: the fraction of manufactured samples that

meet the performance constraints• Environmental variations• Modeling variations

– Power and delay models used to perform design, analysis and optimization are inaccurate

• Other sources– Change in process parameters with time– Hot electrons– Process instability

Page 16: Lecture1 – Introduction and Organization

8/29/2006 16

The Theme of the Course

• About practical learning methods – something you can learn and use in your research

• This is not an embedded system design course nor a sensor network design course!

• The research topics are to motivate real applications of the statistical learning in other fields

• You do not need any prior knowledge of these subjects to learn in this course

• Dynamic reading list

Page 17: Lecture1 – Introduction and Organization

8/29/2006 17

Learning from Data

• Supervised learning– Outcome measurement: either categorical or quantitative

– Predict outcome from a set of features

– Training set of data

– A good learner can predict a testing set well

• Unsupervised learning– Only features, no outcome

Page 18: Lecture1 – Introduction and Organization

8/29/2006 18

Example 1: Email Spam

• Categorical outcome: spam or email

• 4601 email messages

• Rule based learning, e.g.– if (%george < 0.6) & (%you > 1.5) then spam

else email

Page 19: Lecture1 – Introduction and Organization

8/29/2006 19

Example 2: Prostate Cancer

• Correlation b/w the level of prostate specific antigen (PSA) and clinical predictors

• Regression problem!

Page 20: Lecture1 – Introduction and Organization

8/29/2006 20

Example 3: Handwritten Digit Recognition

• Automatic envelope sorting procedure• 16x16 8-bit grayscale, intensity from 0-255• Classification problem!