field investigations in statistics

21
FIELD INVESTIGATIONS IN STATISTICS A CURRICULUM FOR TRAINING FIRST-TIME RESEARCHERS  Timothy V ogel, M.S. Statistics; Florida State University Nathan Good, Ph.D. Computer Science; U-C Berkeley "Science begins with observation, but it is not until we count and measure that we have begun to truly study a thing." Lord Kelvin [1824-1907]

Upload: timothy-vogel

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 1/21

FIELD INVESTIGATIONS IN STATISTICS

A CURRICULUM FOR TRAINING FIRST-TIME RESEARCHERS 

Timothy Vogel, M.S. Statistics; Florida State University

Nathan Good, Ph.D. Computer Science; U-C Berkeley

"Science begins with observation,

but it is not until we count andmeasure that we have begun to

truly study a thing."

Lord Kelvin [1824-1907]

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 2/21

A recent complaint about science education is that it doesn't teach young

scientists how science is actually done. This class seeks to address and fill thatglaring lack of scientific training.

Tim Vogel's first elective Biology classes at the University of Illinois in 1975 wereBiology 208-209, two semesters of "Field Investigations in Biology". Comprising

six topical areas within the biological sciences, undergraduates were afforded

the rare opportunity to be trained as principal research investigators. Authoringtheir own hypotheses, experimental designs, technical papers, methodologies,

statistical analysis, and result presentation under "peer" review, these students"did" actual science for the first time under the supervision of experts.

We take the same approach here, proposing to train fledgling researchers by

allowing them to propose, run, analyze, summarize, and defend their ownresearch projects across nine units of known statistical inference.

Class Development Motivation

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 3/21

•  Undergraduate classes in;•  Introductory Statistics

•  Rhetoric or Technical Writing

•  60 hours of undergraduate coursework.

• Basic computer skills (spreadsheets; SAS, R, Matlab, etc.)

•  Boundless curiosity about the world.

•  A commitment to accomplishment tempered by the ego-

strength required to risk public failure.

Prerequisites

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 4/21

Completion of this class will see each student having;•  gained the confidence to tackle any research project that

presents itself.

•  garnered a profound sense of the nature of inference.

• learned to recognize the unique inferential signature, orlack thereof, inherent to all exercises in inference

generation.

Class Goals

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 5/21

•  project design and implementation 40% (Units 1-8 = 5%)

•  final proposal and paper 40% (20% each)

•  attendance 10%

•  final examination 10% 

Grading

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 6/21

1. 

Hypothesis Testing2.  Tests of Independence

(correlation)

3.  Discrete Data Analysis

4.  Analysis of Variance/

ANOVA

5.  Estimation and Ordinary

Least Squares Regression

6.  Multiple Regression and

MANOVA

7. 

Bayesian Inference8.  Unstructured Data Analysis;

•  Data Mining and Text Mining

9.  Carte Blanche;•  pick your own poison.

•  logistic regression, logits/probits,

eigenspace, multidimensional scaling,canonical correlation, principalcomponents analysis, factor analysis,

supervised and unsupervised

clustering, edit distances, similarity/

distance measures, cladystics

(numerical taxonomy), neural nets,time-series analysis, ...

The First Class Introduction to the 9 statistical units 

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 7/21

Class time trajectory for each unit;•  lecture/review of that unit's statistical approach.

•  presentation of an example from the past; well known experiments

from the archives of the "Science Hall of Fame".

•  group break-out sessions discussing research questions andexperimental designs of your own.

•  individual class presentation summarizing each student's proposedstudy and how they expect to generate their own data.

•  constructive critique by staff and students

•  Individual work to finish off your introductory proposal.

The First Class (continued)How each class period will be conducted 

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 8/21

• When You Are Generous Your Dog Is Watching You!

 

•  from a study by Marshall-Pescini et al. 2011. Social eavesdropping in the domestic dog.

Animal Behaviour (2011), doi:10.1016/j.anbehav.2011.02.029

•  How to ask a good (i.e.; testable) question is hard to teach but not so difficult to learn ifgiven the proper training and opportunity to grow.

"Some people have generous natures, and some people are miserly. The generous ones are

happy to share what they have with others, while the miserly folks resent having to share

anything with anybody. So if given the choice, which type of person would a dog likely approach first?" 

•  "Could I prove this with my dog or cat?" might be an excellent trigger for your muse as

you face the next 9 units' requirement that you pose and test just such a question as "...ifgiven the choice, which type of person would a dog likely approach first?"

A Study You Could Run At HomeCould I prove this with my cat or dog?

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 9/21

Ambrose, H.A., Young, D. (1978). Underwater Orientation in the Sand Fiddler

Crab, Uca Pugilator; Biol Bull 155: 246-258. (August 1978).

•  Assignment #1; Biology 208; Investigations of Field Biology; Fall, 1975.

Thumin, F. J. (1962). Identification of Cola Beverages. Journal of Applied

Psychology, 46, 358-360.

•  a strong hypothesis-testing example used in myriad graduate-level statistics textbooks.

Unit 1Hypothesis Testing

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 10/21

Kinsey, Alfred C. et al. (1948). Sexual Behavior in the Human Male.

Philadelphia: W.B. Saunders; Bloomington, IN: Indiana U. Press. 

Kinsey, Alfred C. et al. (1953). Sexual Behavior in the Human feale.

Philadelphia: W.B. Saunders; Bloomington, IN: Indiana U. Press. 

•  perhaps the most controversial data analyses ever peformed. 

Unit 2Tests of Independence (correlation)

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 11/21

Goodman, L; (1970); The multivariate analysis of qualitative data;Interactions among multiple classifications. JASA, 65:225-56.

•  the "father of the logit", Dr. Goodman is now a joint professor of bothSociology and Statistics at the University of California-Berkeley.

Unit 3Discrete Data Analysis

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 12/21

Fisher, R.A. "The use of multiple measurements in taxonomicproblems"; Annual Eugenics, 7, Part II, 179-188 (1936);

•  there is not an ANOVA class taught on earth that doesn't begin andend with this famous dataset. The endless combinations of complexexperimental designed inference generation that can be realized

from this single dataset is simply astounding.

Unit 4Analysis of Variance

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 13/21

Forbes, J. (1957). Further experiments and remarks on themeasurement of heights and boiling point of water. Transactions of 

the Royal Society of Edinburgh, 21, 235-243.

•  One of the first statistical studies by a non-mathematician to beuniversally accepted by the field of statistics as a

flagship example for teaching ordinary least-squares regression. 

Unit 5Estimation and Regression (OLS)

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 14/21

Harris, RJ; "Directive versus non directive instructions in the prisoner'sdilemma"; presented at The Rocky Mountain Psychological Association (1970). 

•  (pp 68-69)

•  from Tim Vogel's first class in multivariate statistics.

Unit 6Multiple Regression/MANOVA

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 15/21

An Intuitive Explanation of Bayes' Theorem 

•  this is not a scholarly paper, per se, but the best way to see

the benefits to Bayesian reasoning and analysis is via Java-applets like this one.

There are many to choose from but this really does a goodjob!

Unit 7Bayesian Inference

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 16/21

Authorship of the eleven unattributed Federalist Papers written

under the pseudonym Publius and published in various New York

newspapers in the run-up to the Constitutional Convention of 1776.

•  "Who wrote it; Hamilton, Madison, or Jay?"

•  The entire field of text–mining began with this question posed

about the eleven unattributed of Publius' "Federalist Papers".

•  Tim Vogel hopes to present his own published paper; "Statistical

Constitutionality Testing? Citizens United v. the Federal ElectionsCommission (2010)". (Wired; V?; no??)

Unit 8Unstructured Data Analysis - Data Mining and Text Mining

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 17/21

•  logistic regression

•  logits/probits

•  canonical correlation

•  principal components analysis

•  factor analysis

•  supervised clustering

•  unsupervised clustering

•  multidimensional scaling

•  similarity/distance measures

•  cladystics (numerical taxonomy)

•  neural nets

•  hidden Markova models

•  Monte Carlo simulation

•  machine learning algorithms

•  linear programming

•  dynamic programming

•  optimized auctions

•  pick your poison!

Unit 9 Answer a "nagging" research question of your own

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 18/21

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 19/21

  B.S., Genetics; University of Illinois

  M.S., Statistics, Florida State University

  Biologist; Newfound Harbor Marine Institute;Big Pine Key, FL

  Management Consultant; The Werner Group;

NY, NY

  Analytics Product Manager; National

Demographics & Lifestyles, Inc; Denver, CO

  Sr. Manager Data Mining; MCI; Denver, CO

  Analytics Architect; Macromedia, Inc; San

Francisco, CA

  Sr. Software Engineer/Statistician; IBM; RTP, NC

  Chief Scientist; Aggregate Knowledge, Inc; SanMateo, CA

  Founder/CEO; on-to-logica.com; San Mateo,

CA

  B.S., Mathematics; University of Minnesota

  M.S., Computer Science; University of

Minnesota

  Ph.D., Computer Science; U-C Berkeley (Hal

Varian, ?, ?)

  Research Fellow; Parc/Xerox

  Contract Researcher; Aggregate Knowledge,Inc; San Mateo, CA

  Founder; Good Research

Timothy Vogel Nathan Good

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

Instructors

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 20/21

•  UC-Irvine Machine Learning repository

•  http://archive.ics.uci.edu/ml/datasets.html

•  StatSci.org

•  http://www.statsci.org/datasets.html

•  Journal of Applied Econometrics DataArchive

•  http://qed.econ.queensu.ca/jae/

•  Monte Carlo Simulation

•  http://www.chem.unl.edu/zeng/joy/mclab/

mcintro.html

•  Amazon's Public Database in the "Clouds"

•  http://aws.amazon.com/publicdatasets/

•  SBA City & County Wed Database

•  http://www.data.gov/

•  Google public data explorer

•  http://www.google.com/publicdata/directory

•  Knowledge Discovery in Databases

•  http://www.kdnuggets.com/2011/02/free-

public-datasets.html

Public Datasets

Field Investigations In StatisticsA Curriculum For Training First-time Researchers

8/6/2019 Field Investigations in Statistics

http://slidepdf.com/reader/full/field-investigations-in-statistics 21/21

•  Dr. Harrison J. Ambrose III

•  Writing a Scientific Research Article

•  Dr. Con Slobodchikoff 

•  Animal Communication

•  DR. CRAIG LOEHLE •  A guide to increased creativity in research -

inspiration or perspiration?

•  Dr. Leo Goodman 

•  Sociologist/Demographer

•  Dr. Jeffery Carrier 

•  Shark Biologist/Physiologist

Mentors' Links

Field Investigations In StatisticsA Curriculum For Training First-time Researchers