wiki. statistics

15
Statistics More probability density is found as one gets closer to the ex- pected (mean) value in a normal distribution. Statistics used in standardized testing assessment are shown. The scales include standard deviations, cumulative percentages, percentile equiv- alents, Z-scores, T-scores, standard nines, and percentages in standard nines. Scatter plots are used in descriptive statistics to show the observed relationships between different variables. Statistics is the study of the collection, analysis, interpre- tation, presentation, and organization of data. [1] In apply- ing statistics to, e.g., a scientific, industrial, or societal problem, it is conventional to begin with a statistical pop- ulation or a statistical model process to be studied. Popu- lations can be diverse topics such as “all persons living in a country” or “every atom composing a crystal”. Statis- tics deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments. [1] When census data cannot be collected, statisticians col- lect data by developing specific experiment designs and survey samples. Representative sampling assures that in- ferences and conclusions can safely extend from the sam- ple to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional mea- surements using the same procedure to determine if the manipulation has modified the values of the measure- ments. In contrast, an observational study does not in- volve experimental manipulation. Two main statistical methodologies are used in data anal- ysis: descriptive statistics, which summarizes data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draws conclu- sions from data that are subject to random variation (e.g., observational errors, sampling variation). [2] Descriptive statistics are most often concerned with two sets of prop- erties of a distribution (sample or population): central ten- dency (or location) seeks to characterize the distribution’s central or typical value, while dispersion (or variability) characterizes the extent to which members of the distri- bution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. To make an inference upon un- known quantities, one or more estimators are evaluated using the sample. Standard statistical procedure involve the development of a null hypothesis, a general statement or default position that there is no relationship between two quantities. Re- jecting or disproving the null hypothesis is a central task in the modern practice of science, and gives a precise sense in which a claim is capable of being proven false. What statisticians call an alternative hypothesis is simply an hypothesis that contradicts the null hypothesis. Work- ing from a null hypothesis two basic forms of error are recognized: Type I errors (null hypothesis is falsely re- jected giving a “false positive”) and Type II errors (null hypothesis fails to be rejected and an actual difference between populations is missed giving a “false negative”). A critical region is the set of values of the estimator that leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn't belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability 1

Upload: brunoschwarz

Post on 26-Sep-2015

16 views

Category:

Documents


0 download

DESCRIPTION

wikisource

TRANSCRIPT

  • Statistics

    More probability density is found as one gets closer to the ex-pected (mean) value in a normal distribution. Statistics used instandardized testing assessment are shown. The scales includestandard deviations, cumulative percentages, percentile equiv-alents, Z-scores, T-scores, standard nines, and percentages instandard nines.

    Scatter plots are used in descriptive statistics to show the observedrelationships between dierent variables.

    Statistics is the study of the collection, analysis, interpre-tation, presentation, and organization of data.[1] In apply-ing statistics to, e.g., a scientic, industrial, or societalproblem, it is conventional to begin with a statistical pop-ulation or a statistical model process to be studied. Popu-lations can be diverse topics such as all persons living ina country or every atom composing a crystal. Statis-tics deals with all aspects of data including the planningof data collection in terms of the design of surveys andexperiments.[1]

    When census data cannot be collected, statisticians col-lect data by developing specic experiment designs andsurvey samples. Representative sampling assures that in-ferences and conclusions can safely extend from the sam-ple to the population as a whole. An experimental studyinvolves taking measurements of the system under study,manipulating the system, and then taking additional mea-surements using the same procedure to determine if themanipulation has modied the values of the measure-ments. In contrast, an observational study does not in-volve experimental manipulation.Two main statistical methodologies are used in data anal-ysis: descriptive statistics, which summarizes data froma sample using indexes such as the mean or standarddeviation, and inferential statistics, which draws conclu-sions from data that are subject to random variation (e.g.,observational errors, sampling variation).[2] Descriptivestatistics are most often concerned with two sets of prop-erties of a distribution (sample or population): central ten-dency (or location) seeks to characterize the distributionscentral or typical value, while dispersion (or variability)characterizes the extent to which members of the distri-bution depart from its center and each other. Inferenceson mathematical statistics are made under the frameworkof probability theory, which deals with the analysis ofrandom phenomena. To make an inference upon un-known quantities, one or more estimators are evaluatedusing the sample.Standard statistical procedure involve the development ofa null hypothesis, a general statement or default positionthat there is no relationship between two quantities. Re-jecting or disproving the null hypothesis is a central taskin the modern practice of science, and gives a precisesense in which a claim is capable of being proven false.What statisticians call an alternative hypothesis is simplyan hypothesis that contradicts the null hypothesis. Work-ing from a null hypothesis two basic forms of error arerecognized: Type I errors (null hypothesis is falsely re-jected giving a false positive) and Type II errors (nullhypothesis fails to be rejected and an actual dierencebetween populations is missed giving a false negative).A critical region is the set of values of the estimator thatleads to refuting the null hypothesis. The probability oftype I error is therefore the probability that the estimatorbelongs to the critical region given that null hypothesis istrue (statistical signicance) and the probability of typeII error is the probability that the estimator doesn't belongto the critical region given that the alternative hypothesisis true. The statistical power of a test is the probability

    1

  • 2 3 DATA COLLECTION

    that it correctly rejects the null hypothesis when the nullhypothesis is false. Multiple problems have come to beassociated with this framework: ranging from obtaininga sucient sample size to specifying an adequate null hy-pothesis.Measurement processes that generate statistical data arealso subject to error. Many of these errors are classiedas random (noise) or systematic (bias), but other impor-tant types of errors (e.g., blunder, such as when an ana-lyst reports incorrect units) can also be important. Thepresence of missing data and/or censoring may result inbiased estimates and specic techniques have been de-veloped to address these problems. Condence intervalsallow statisticians to express how closely the sample es-timate matches the true value in the whole population.Formally, a 95% condence interval for a value is a rangewhere, if the sampling and analysis were repeated underthe same conditions (yielding a dierent dataset), the in-terval would include the true (population) value in 95% ofall possible cases. In statistics, dependence is any statisti-cal relationship between two random variables or two setsof data. Correlation refers to any of a broad class of sta-tistical relationships involving dependence. If two vari-ables are correlated, they may or may not be the cause ofone another. The correlation phenomena could be causedby a third, previously unconsidered phenomenon, calleda lurking variable or confounding variable.Statistics can be said to have begun in ancient civilization,going back at least to the 5th century BC, but it was notuntil the 18th century that it started to draw more heavilyfrom calculus and probability theory. Statistics continuesto be an area of active research, for example on the prob-lem of how to analyze Big data.

    1 ScopeStatistics is a mathematical body of science that per-tains to the collection, analysis, interpretation or ex-planation, and presentation of data,[3] or as a branchof mathematics.[4] Some consider statistics to be a dis-tinct mathematical science rather than a branch ofmathematics.[5][6]

    1.1 Mathematical statisticsMain article: Mathematical statistics

    Mathematical statistics is the application of mathematicsto statistics, which was originally conceived as the scienceof the state the collection and analysis of facts abouta country: its economy, land, military, population, andso forth. Mathematical techniques used for this includemathematical analysis, linear algebra, stochastic analysis,dierential equations, and measure-theoretic probabilitytheory.[7][8]

    2 OverviewIn applying statistics to e.g. a scientic, industrial, or so-cietal problem, it is necessary to begin with a populationor process to be studied. Populations can be diverse top-ics such as all persons living in a country or every atomcomposing a crystal.Ideally, statisticians compile data about the entire pop-ulation (an operation called census). This may be orga-nized by governmental statistical institutes. Descriptivestatistics can be used to summarize the population data.Numerical descriptors include mean and standard devi-ation for continuous data types (like income), while fre-quency and percentage are more useful in terms of de-scribing categorical data (like race).When a census is not feasible, a chosen subset of the pop-ulation called a sample is studied. Once a sample thatis representative of the population is determined, datais collected for the sample members in an observationalor experimental setting. Again, descriptive statistics canbe used to summarize the sample data. However, thedrawing of the sample has been subject to an elementof randomness, hence the established numerical descrip-tors from the sample are also due to uncertainty. Tostill draw meaningful conclusions about the entire pop-ulation, inferential statistics is needed. It uses patternsin the sample data to draw inferences about the popula-tion represented, accounting for randomness. These in-ferences may take the form of: answering yes/no ques-tions about the data (hypothesis testing), estimating nu-merical characteristics of the data (estimation), describ-ing associations within the data (correlation) and mod-eling relationships within the data (for example, usingregression analysis). Inference can extend to forecasting,prediction and estimation of unobserved values either inor associated with the population being studied; it caninclude extrapolation and interpolation of time series orspatial data, and can also include data mining.

    3 Data collection

    3.1 Sampling

    In case census data cannot be collected, statisticianscollect data by developing specic experiment designsand survey samples. Statistics itself also provides toolsfor prediction and forecasting the use of data throughstatistical models. To use a sample as a guide to an entirepopulation, it is important that it truly represent the over-all population. Representative sampling assures that in-ferences and conclusions can safely extend from the sam-ple to the population as a whole. A major problem lies indetermining the extent that the sample chosen is actuallyrepresentative. Statistics oers methods to estimate andcorrect for any random trending within the sample and

  • 3.2 Experimental and observational studies 3

    data collection procedures. There are also methods ofexperimental design for experiments that can lessen theseissues at the outset of a study, strengthening its capabilityto discern truths about the population.Sampling theory is part of the mathematical discipline ofprobability theory. Probability is used in mathematicalstatistics (alternatively, "statistical theory") to study thesampling distributions of sample statistics and, more gen-erally, the properties of statistical procedures. The use ofany statistical method is valid when the system or popula-tion under consideration satises the assumptions of themethod. The dierence in point of view between clas-sic probability theory and sampling theory is, roughly,that probability theory starts from the given parametersof a total population to deduce probabilities that pertainto samples. Statistical inference, however, moves in theopposite directioninductively inferring from samples tothe parameters of a larger or total population.

    3.2 Experimental and observational stud-ies

    A common goal for a statistical research project is to in-vestigate causality, and in particular to draw a conclu-sion on the eect of changes in the values of predic-tors or independent variables on dependent variables orresponse. There are two major types of causal statisti-cal studies: experimental studies and observational stud-ies. In both types of studies, the eect of dierencesof an independent variable (or variables) on the behav-ior of the dependent variable are observed. The dier-ence between the two types lies in how the study is ac-tually conducted. Each can be very eective. An exper-imental study involves taking measurements of the sys-tem under study, manipulating the system, and then tak-ing additional measurements using the same procedureto determine if the manipulation has modied the val-ues of the measurements. In contrast, an observationalstudy does not involve experimental manipulation. In-stead, data are gathered and correlations between predic-tors and response are investigated. While the tools of dataanalysis work best on data from randomized studies, theyare also applied to other kinds of data like natural exper-iments and observational studies[9] for which a statisti-cian would use a modied, more structured estimationmethod (e.g., Dierence in dierences estimation andinstrumental variables, among many others) that produceconsistent estimators.

    3.2.1 Experiments

    The basic steps of a statistical experiment are:

    1. Planning the research, including nding the numberof replicates of the study, using the following infor-mation: preliminary estimates regarding the size of

    treatment eects, alternative hypotheses, and the es-timated experimental variability. Consideration ofthe selection of experimental subjects and the ethicsof research is necessary. Statisticians recommendthat experiments compare (at least) one new treat-ment with a standard treatment or control, to allowan unbiased estimate of the dierence in treatmenteects.

    2. Design of experiments, using blocking to reduce theinuence of confounding variables, and randomizedassignment of treatments to subjects to allowunbiased estimates of treatment eects and exper-imental error. At this stage, the experimenters andstatisticians write the experimental protocol that willguide the performance of the experiment and whichspecies the primary analysis of the experimentaldata.

    3. Performing the experiment following theexperimental protocol and analyzing the datafollowing the experimental protocol.

    4. Further examining the data set in secondary analy-ses, to suggest new hypotheses for future study.

    5. Documenting and presenting the results of the study.

    Experiments on human behavior have special concerns.The famous Hawthorne study examined changes to theworking environment at the Hawthorne plant of theWestern Electric Company. The researchers were in-terested in determining whether increased illuminationwould increase the productivity of the assembly lineworkers. The researchers rst measured the productiv-ity in the plant, then modied the illumination in an areaof the plant and checked if the changes in illumination af-fected productivity. It turned out that productivity indeedimproved (under the experimental conditions). However,the study is heavily criticized today for errors in experi-mental procedures, specically for the lack of a controlgroup and blindness. The Hawthorne eect refers tonding that an outcome (in this case, worker produc-tivity) changed due to observation itself. Those in theHawthorne study became more productive not becausethe lighting was changed but because they were beingobserved.[10]

    3.2.2 Observational study

    An example of an observational study is one that exploresthe correlation between smoking and lung cancer. Thistype of study typically uses a survey to collect observa-tions about the area of interest and then performs statis-tical analysis. In this case, the researchers would collectobservations of both smokers and non-smokers, perhapsthrough a case-control study, and then look for the num-ber of cases of lung cancer in each group.

  • 4 5 TERMINOLOGY AND THEORY OF INFERENTIAL STATISTICS

    4 Types of data

    Main articles: Statistical data type and Levels of mea-surement

    Various attempts have been made to produce a taxonomyof levels of measurement. The psychophysicist StanleySmith Stevens dened nominal, ordinal, interval, and ra-tio scales. Nominal measurements do not have mean-ingful rank order among values, and permit any one-to-one transformation. Ordinal measurements have impre-cise dierences between consecutive values, but have ameaningful order to those values, and permit any order-preserving transformation. Interval measurements havemeaningful distances between measurements dened, butthe zero value is arbitrary (as in the case with longitudeand temperature measurements in Celsius or Fahrenheit),and permit any linear transformation. Ratio measure-ments have both a meaningful zero value and the dis-tances between dierent measurements dened, and per-mit any rescaling transformation.Because variables conforming only to nominal or ordinalmeasurements cannot be reasonably measured numeri-cally, sometimes they are grouped together as categoricalvariables, whereas ratio and interval measurements aregrouped together as quantitative variables, which canbe either discrete or continuous, due to their numeri-cal nature. Such distinctions can often be loosely cor-related with data type in computer science, in that di-chotomous categorical variables may be represented withthe Boolean data type, polytomous categorical variableswith arbitrarily assigned integers in the integral data type,and continuous variables with the real data type involvingoating point computation. But the mapping of computerscience data types to statistical data types depends onwhich categorization of the latter is being implemented.Other categorizations have been proposed. For exam-ple, Mosteller and Tukey (1977)[11] distinguished grades,ranks, counted fractions, counts, amounts, and balances.Nelder (1990)[12] described continuous counts, continu-ous ratios, count ratios, and categorical modes of data.See also Chrisman (1998),[13] van den Berg (1991).[14]

    The issue of whether or not it is appropriate to applydierent kinds of statistical methods to data obtainedfrom dierent kinds of measurement procedures is com-plicated by issues concerning the transformation of vari-ables and the precise interpretation of research questions.The relationship between the data and what they de-scribe merely reects the fact that certain kinds of sta-tistical statements may have truth values which are notinvariant under some transformations. Whether or not atransformation is sensible to contemplate depends on thequestion one is trying to answer (Hand, 2004, p. 82).[15]

    5 Terminology and theory of infer-ential statistics

    5.1 Statistics, estimators and pivotal quan-tities

    Consider an independent identically distributed (IID) ran-dom variables with a given probability distribution: stan-dard statistical inference and estimation theory denes arandom sample as the random vector given by the columnvector of these IID variables.[16] The population being ex-amined is described by a probability distribution that mayhave unknown parameters.A statistic is a random variable that is a function of therandom sample, but not a function of unknown parame-ters. The probability distribution of the statistic, though,may have unknown parameters.Consider now a function of the unknown parameter: anestimator is a statistic used to estimate such function.Commonly used estimators include sample mean, unbi-ased sample variance and sample covariance.A random variable that is a function of the random sam-ple and of the unknown parameter, but whose probabilitydistribution does not depend on the unknown parameter iscalled a pivotal quantity or pivot. Widely used pivots in-clude the z-score, the chi square statistic and Studentst-value.Between two estimators of a given parameter, the onewith lower mean squared error is said to be more ecient.Furthermore, an estimator is said to be unbiased if itsexpected value is equal to the true value of the unknownparameter being estimated, and asymptotically unbiasedif its expected value converges at the limit to the truevalue of such parameter.Other desirable properties for estimators include:UMVUE estimators that have the lowest variance for allpossible values of the parameter to be estimated (this isusually an easier property to verify than eciency) andconsistent estimators which converges in probability tothe true value of such parameter.This still leaves the question of how to obtain estima-tors in a given situation and carry the computation, sev-eral methods have been proposed: the method of mo-ments, the maximum likelihoodmethod, the least squaresmethod and the more recent method of estimating equa-tions.

    5.2 Null hypothesis and alternative hy-pothesis

    Interpretation of statistical information can often involvethe development of a null hypothesis in that the assump-tion is that whatever is proposed as a cause has no eecton the variable being measured.

  • 5.4 Interval estimation 5

    The best illustration for a novice is the predicament en-countered by a jury trial. The null hypothesis, H0, assertsthat the defendant is innocent, whereas the alternative hy-pothesis, H1, asserts that the defendant is guilty. The in-dictment comes because of suspicion of the guilt. The H0(status quo) stands in opposition to H1 and is maintainedunless H1 is supported by evidence beyond a reasonabledoubt. However, failure to reject H0" in this case doesnot imply innocence, but merely that the evidence was in-sucient to convict. So the jury does not necessarily ac-cept H0 but fails to reject H0. While one can not provea null hypothesis, one can test how close it is to being truewith a power test, which tests for type II errors.What statisticians call an alternative hypothesis is simplyan hypothesis that contradicts the null hypothesis.

    5.3 ErrorWorking from a null hypothesis two basic forms of errorare recognized:

    Type I errors where the null hypothesis is falsely re-jected giving a false positive.

    Type II errors where the null hypothesis fails to berejected and an actual dierence between popula-tions is missed giving a false negative.

    Standard deviation refers to the extent to which individualobservations in a sample dier from a central value, suchas the sample or population mean, while Standard errorrefers to an estimate of dierence between sample meanand population mean.A statistical error is the amount by which an observationdiers from its expected value, a residual is the amountan observation diers from the value the estimator of theexpected value assumes on a given sample (also calledprediction).Mean squared error is used for obtaining ecient estima-tors, a widely used class of estimators. Root mean squareerror is simply the square root of mean squared error.Many statistical methods seek to minimize the residualsum of squares, and these are called "methods of leastsquares" in contrast to Least absolute deviations. Thelater gives equal weight to small and big errors, whilethe former gives more weight to large errors. Residualsum of squares is also dierentiable, which provides ahandy property for doing regression. Least squares ap-plied to linear regression is called ordinary least squaresmethod and least squares applied to nonlinear regressionis called non-linear least squares. Also in a linear re-gression model the non deterministic part of the modelis called error term, disturbance or more simply noise.Both linear regression and non-linear regression are ad-dressed in polynomial least squares, which also describesthe variance in a prediction of the dependent variable (y

    A least squares t: in red the points to be tted, in blue the ttedline.

    axis) as a function of the independent variable (x axis)and the deviations (errors, noise, disturbances) from theestimated (tted) curve.Measurement processes that generate statistical data arealso subject to error. Many of these errors are classiedas random (noise) or systematic (bias), but other impor-tant types of errors (e.g., blunder, such as when an analystreports incorrect units) can also be important. The pres-ence ofmissing data and/or censoringmay result in biasedestimates and specic techniques have been developed toaddress these problems.[17]

    5.4 Interval estimation

    Main article: Interval estimationMost studies only sample part of a population, so results

    Condence intervals: the red line is true value for the mean inthis example, the blue lines are random condence intervals for100 realizations.

    don't fully represent the whole population. Any estimatesobtained from the sample only approximate the popula-tion value. Condence intervals allow statisticians to ex-press how closely the sample estimate matches the truevalue in the whole population. Often they are expressedas 95% condence intervals. Formally, a 95% condenceinterval for a value is a range where, if the sampling andanalysis were repeated under the same conditions (yield-

  • 6 5 TERMINOLOGY AND THEORY OF INFERENTIAL STATISTICS

    ing a dierent dataset), the interval would include thetrue (population) value in 95% of all possible cases. Thisdoes not imply that the probability that the true value isin the condence interval is 95%. From the frequentistperspective, such a claim does not even make sense, asthe true value is not a random variable. Either the truevalue is or is not within the given interval. However, itis true that, before any data are sampled and given a planfor how to construct the condence interval, the probabil-ity is 95% that the yet-to-be-calculated interval will coverthe true value: at this point, the limits of the interval areyet-to-be-observed random variables. One approach thatdoes yield an interval that can be interpreted as having agiven probability of containing the true value is to use acredible interval from Bayesian statistics: this approachdepends on a dierent way of interpreting what is meantby probability, that is as a Bayesian probability.In principle condence intervals can be symmetrical orasymmetrical. An interval can be asymmetrical becauseit works as lower or upper bound for a parameter (left-sided interval or right sided interval), but it can also beasymmetrical because the two sided interval is built vi-olating symmetry around the estimate. Sometimes thebounds for a condence interval are reached asymptoti-cally and these are used to approximate the true bounds.

    5.5 Signicance

    Main article: Statistical signicance

    Statistics rarely give a simple Yes/No type answer to thequestion under analysis. Interpretation often comes downto the level of statistical signicance applied to the num-bers and often refers to the probability of a value accu-rately rejecting the null hypothesis (sometimes referredto as the p-value).The standard approach[16] is to test a null hypothesisagainst an alternative hypothesis. A critical region is theset of values of the estimator that leads to refuting the nullhypothesis. The probability of type I error is thereforethe probability that the estimator belongs to the criticalregion given that null hypothesis is true (statistical signif-icance) and the probability of type II error is the probabil-ity that the estimator doesn't belong to the critical regiongiven that the alternative hypothesis is true. The statisticalpower of a test is the probability that it correctly rejectsthe null hypothesis when the null hypothesis is false.Referring to statistical signicance does not necessarilymean that the overall result is signicant in real worldterms. For example, in a large study of a drug it may beshown that the drug has a statistically signicant but verysmall benecial eect, such that the drug is unlikely tohelp the patient noticeably.While in principle the acceptable level of statistical signif-icance may be subject to debate, the p-value is the small-

    Set of possible results

    Prob

    abili

    ty d

    ensi

    ty

    Observeddata point

    More likely observation

    Very un-likelyobservations

    P-value

    Very un-likelyobservations

    A p-value (shaded green area) is the probability of an observed(or more extreme) result assuming that the null hypothesis is true.

    Important:

    Pr (observation | hypothesis) Pr (hypothesis | observation)The probability of observing a result given that some hypothesisis true is not equivalant to the probability that a hypothesis is truegiven that some result has been observed.

    Using the p-value as a score is committing an egregious logical error: the transposed conditional fallacy.

    In this graph the black line is probability distribution for the teststatistic, the critical region is the set of values to the right of theobserved data point (observed value of the test statistic) and thep-value is represented by the green area.

    est signicance level that allows the test to reject the nullhypothesis. This is logically equivalent to saying that thep-value is the probability, assuming the null hypothesis istrue, of observing a result at least as extreme as the teststatistic. Therefore the smaller the p-value, the lower theprobability of committing type I error.Some problems are usually associated with this frame-work (See criticism of hypothesis testing):

    A dierence that is highly statistically signicantcan still be of no practical signicance, but it is pos-sible to properly formulate tests to account for this.One response involves going beyond reporting onlythe signicance level to include the p-value when re-porting whether a hypothesis is rejected or accepted.The p-value, however, does not indicate the size orimportance of the observed eect and can also seemto exaggerate the importance of minor dierencesin large studies. A better and increasingly commonapproach is to report condence intervals. Althoughthese are produced from the same calculations asthose of hypothesis tests or p-values, they describeboth the size of the eect and the uncertainty sur-rounding it.

    Fallacy of the transposed conditional, akaprosecutors fallacy: criticisms arise becausethe hypothesis testing approach forces one hypoth-esis (the null hypothesis) to be favored, since whatis being evaluated is probability of the observedresult given the null hypothesis and not probabilityof the null hypothesis given the observed result. Analternative to this approach is oered by Bayesianinference, although it requires establishing a prior

  • 7probability.[18]

    Rejecting the null hypothesis does not automaticallyprove the alternative hypothesis.

    As everything in inferential statistics it relies on sam-ple size, and therefore under fat tails p-values maybe seriously mis-computed.

    5.6 ExamplesSome well-known statistical tests and procedures are:

    Analysis of variance (ANOVA) Chi-squared test Correlation Factor analysis MannWhitney U Mean square weighted deviation (MSWD) Pearson product-moment correlation coecient Regression analysis Spearmans rank correlation coecient Students t-test Time series analysis Conjoint Analysis

    6 Misuse of statisticsMain article: Misuse of statistics

    Misuse of statistics can produce subtle, but serious errorsin description and interpretationsubtle in the sense thateven experienced professionals make such errors, and se-rious in the sense that they can lead to devastating deci-sion errors. For instance, social policy, medical practice,and the reliability of structures like bridges all rely on theproper use of statistics.Even when statistical techniques are correctly applied,the results can be dicult to interpret for those lackingexpertise. The statistical signicance of a trend in thedatawhich measures the extent to which a trend couldbe caused by random variation in the samplemay ormay not agree with an intuitive sense of its signicance.The set of basic statistical skills (and skepticism) that peo-ple need to deal with information in their everyday livesproperly is referred to as statistical literacy.There is a general perception that statistical knowl-edge is all-too-frequently intentionally misused by nding

    ways to interpret only the data that are favorable to thepresenter.[19] A mistrust and misunderstanding of statis-tics is associated with the quotation, "There are threekinds of lies: lies, damned lies, and statistics". Misuseof statistics can be both inadvertent and intentional, andthe book How to Lie with Statistics[19] outlines a rangeof considerations. In an attempt to shed light on the useand misuse of statistics, reviews of statistical techniquesused in particular elds are conducted (e.g. Warne, Lazo,Ramos, and Ritter (2012)).[20]

    Ways to avoid misuse of statistics include using properdiagrams and avoiding bias.[21] Misuse can occur whenconclusions are overgeneralized and claimed to be repre-sentative of more than they really are, often by either de-liberately or unconsciously overlooking sampling bias.[22]Bar graphs are arguably the easiest diagrams to use andunderstand, and they can be made either by hand or withsimple computer programs.[21] Unfortunately, most peo-ple do not look for bias or errors, so they are not noticed.Thus, people may often believe that something is trueeven if it is not well represented.[22] To make data gath-ered from statistics believable and accurate, the sampletaken must be representative of the whole.[23] Accordingto Hu, The dependability of a sample can be destroyedby [bias]... allow yourself some degree of skepticism.[24]

    To assist in the understanding of statistics Hu proposeda series of questions to be asked in each case:[24]

    Who says so? (Does he/she have an axe to grind?) How does he/she know? (Does he/she have the re-sources to know the facts?)

    Whats missing? (Does he/she give us a completepicture?)

    Did someone change the subject? (Does he/she oerus the right answer to the wrong problem?)

    Does it make sense? (Is his/her conclusion logicaland consistent with what we already know?)

    The confounding variable problem: X and Y may be correlated,not because there is causal relationship between them, but be-cause both depend on a third variable Z. Z is called a confound-ing factor.

  • 8 7 HISTORY OF STATISTICAL SCIENCE

    6.1 Misinterpretation: correlationThe concept of correlation is particularly noteworthy forthe potential confusion it can cause. Statistical analysis ofa data set often reveals that two variables (properties) ofthe population under consideration tend to vary together,as if they were connected. For example, a study of an-nual income that also looks at age of death might nd thatpoor people tend to have shorter lives than auent peo-ple. The two variables are said to be correlated; however,they may or may not be the cause of one another. Thecorrelation phenomena could be caused by a third, pre-viously unconsidered phenomenon, called a lurking vari-able or confounding variable. For this reason, there is noway to immediately infer the existence of a causal rela-tionship between the two variables. (See Correlation doesnot imply causation.)

    7 History of statistical science

    Blaise Pascal, an early pioneer on the mathematics of probabil-ity.

    Main articles: History of statistics and Founders ofstatistics

    Statistical methods date back at least to the 5th centuryBC.Some scholars pinpoint the origin of statistics to 1663,with the publication ofNatural and Political Observationsupon the Bills ofMortality by John Graunt.[25] Early appli-cations of statistical thinking revolved around the needsof states to base policy on demographic and economicdata, hence its stat- etymology. The scope of the dis-

    cipline of statistics broadened in the early 19th centuryto include the collection and analysis of data in general.Today, statistics is widely employed in government, busi-ness, and natural and social sciences.Its mathematical foundations were laid in the 17th cen-tury with the development of the probability theoryby Blaise Pascal and Pierre de Fermat. Mathemati-cal probability theory arose from the study of games ofchance, although the concept of probability was alreadyexamined in medieval law and by philosophers such asJuan Caramuel.[26] The method of least squares was rstdescribed by Adrien-Marie Legendre in 1805.

    Karl Pearson, the founder of mathematical statistics.

    The modern eld of statistics emerged in the late 19thand early 20th century in three stages.[27] The rst wave,at the turn of the century, was led by the work of SirFrancis Galton and Karl Pearson, who transformed statis-tics into a rigorous mathematical discipline used for anal-ysis, not just in science, but in industry and politics aswell. Galtons contributions to the eld included intro-ducing the concepts of standard deviation, correlation,regression and the application of these methods to thestudy of the variety of human characteristics height,weight, eyelash length among others.[28] Pearson devel-oped the Correlation coecient, dened as a product-moment,[29] the method of moments for the tting of dis-tributions to samples and the Pearsons system of con-tinuous curves, among many other things.[30] Galton andPearson founded Biometrika as the rst journal of mathe-matical statistics and biometry, and the latter founded theworlds rst university statistics department at UniversityCollege London.[31]

    The second wave of the 1910s and 20s was initiated byWilliam Gosset, and reached its culmination in the in-sights of Sir Ronald Fisher, who wrote the textbooksthat were to dene the academic discipline in universitiesaround the world. Fishers most important publicationswere his 1916 seminal paper The Correlation betweenRelatives on the Supposition of Mendelian Inheritance andhis classic 1925 work Statistical Methods for ResearchWorkers. His paper was the rst to use the statistical term,variance. He developed rigorous experimental models

  • 8.2 Machine learning and data mining 9

    Ronald Fisher coined the term "null hypothesis".

    and also originated the concepts of suciency, ancillarystatistics, Fishers linear discriminator and Fisher infor-mation.[32]

    The nal wave, which mainly saw the renement and ex-pansion of earlier developments, emerged from the col-laborative work between Egon Pearson and Jerzy Ney-man in the 1930s. They introduced the concepts of "TypeII" error, power of a test and condence intervals. JerzyNeyman in 1934 showed that stratied random samplingwas in general a better method of estimation than purpo-sive (quota) sampling.[33]

    Today, statistical methods are applied in all elds that in-volve decision making, for making accurate inferencesfrom a collated body of data and for making decisionsin the face of uncertainty based on statistical methodol-ogy. The use of modern computers has expedited large-scale statistical computations, and has also made possiblenew methods that are impractical to perform manually.Statistics continues to be an area of active research, forexample on the problem of how to analyze Big data.[34]

    8 Applications

    8.1 Applied statistics, theoretical statisticsand mathematical statistics

    Applied statistics comprises descriptive statistics andthe application of inferential statistics.[35] Theoreticalstatistics concerns both the logical arguments underly-ing justication of approaches to statistical inference,as well encompassing mathematical statistics. Mathe-

    matical statistics includes not only the manipulation ofprobability distributions necessary for deriving results re-lated tomethods of estimation and inference, but also var-ious aspects of computational statistics and the design ofexperiments.

    8.2 Machine learning and data miningThere are two applications for machine learning and datamining: data management and data analysis. Statisticstools are necessary for the data analysis.

    8.3 Statistics in societyStatistics is applicable to a wide variety of academic disci-plines, including natural and social sciences, government,and business. Statistical consultants can help organiza-tions and companies that don't have in-house expertiserelevant to their particular questions.

    8.4 Statistical computing

    gretl, an example of an open source statistical package

    Main article: Computational statistics

    The rapid and sustained increases in computing powerstarting from the second half of the 20th century have hada substantial impact on the practice of statistical science.Early statistical models were almost always from the classof linear models, but powerful computers, coupled withsuitable numerical algorithms, caused an increased inter-est in nonlinear models (such as neural networks) as wellas the creation of new types, such as generalized linearmodels and multilevel models.Increased computing power has also led to the grow-ing popularity of computationally intensive methodsbased on resampling, such as permutation tests and thebootstrap, while techniques such as Gibbs sampling have

  • 10 9 SPECIALIZED DISCIPLINES

    made use of Bayesian models more feasible. The com-puter revolution has implications for the future of statis-tics with new emphasis on experimental and empiri-cal statistics. A large number of both general and specialpurpose statistical software are now available.

    8.5 Statistics applied to mathematics orthe arts

    Traditionally, statistics was concerned with drawing in-ferences using a semi-standardized methodology that wasrequired learning in most sciences. This has changedwith use of statistics in non-inferential contexts. Whatwas once considered a dry subject, taken in many eldsas a degree-requirement, is now viewed enthusiastically.Initially derided by some mathematical purists, it is nowconsidered essential methodology in certain areas.

    In number theory, scatter plots of data generatedby a distribution function may be transformed withfamiliar tools used in statistics to reveal underlyingpatterns, which may then lead to hypotheses.

    Methods of statistics including predictive methodsin forecasting are combined with chaos theory andfractal geometry to create video works that are con-sidered to have great beauty.

    The process art of Jackson Pollock relied on artis-tic experiments whereby underlying distributions innature were artistically revealed. With the advent ofcomputers, statistical methods were applied to for-malize such distribution-driven natural processes tomake and analyze moving video art.

    Methods of statistics may be used predicatively inperformance art, as in a card trick based on aMarkov process that only works some of the time,the occasion of which can be predicted using statis-tical methodology.

    Statistics can be used to predicatively create art, as inthe statistical or stochastic music invented by IannisXenakis, where the music is performance-specic.Though this type of artistry does not always comeout as expected, it does behave in ways that are pre-dictable and tunable using statistics.

    9 Specialized disciplinesMain article: List of elds of application of statistics

    Statistical techniques are used in a wide range oftypes of scientic and social research, including:biostatistics, computational biology, computational so-ciology, network biology, social science, sociology and

    social research. Some elds of inquiry use applied statis-tics so extensively that they have specialized terminology.These disciplines include:

    Actuarial science (assesses risk in the insurance andnance industries)

    Applied information economics Astrostatistics (statistical evaluation of astronomicaldata)

    Biostatistics Business statistics Chemometrics (for analysis of data from chemistry) Data mining (applying statistics and pattern recog-nition to discover knowledge from data)

    Demography Econometrics (statistical analysis of economic data) Energy statistics Engineering statistics Epidemiology (statistical analysis of disease) Geography and Geographic Information Systems,specically in Spatial analysis

    Image processing

    Medical Statistics

    Psychological statistics Reliability engineering Social statistics

    In addition, there are particular types of statistical analy-sis that have also developed their own specialised termi-nology and methodology:

    Bootstrap / Jackknife resampling Multivariate statistics Statistical classication Structured data analysis (statistics) Structural equation modelling Survey methodology Survival analysis Statistics in various sports, particularly baseball -known as 'Sabremetrics - and cricket

  • 11

    Statistics form a key basis tool in business and manufac-turing as well. It is used to understand measurement sys-tems variability, control processes (as in statistical pro-cess control or SPC), for summarizing data, and to makedata-driven decisions. In these roles, it is a key tool, andperhaps the only reliable tool.

    10 See alsoMain article: Outline of statistics

    Abundance estimation Glossary of probability and statistics List of academic statistical associations List of important publications in statistics List of national and international statistical services List of statistical packages (software) List of statistics articles List of university statistical consulting centers Notation in probability and statistics Consultation in Statistics

    Foundations and major areas of statistics

    Foundations of statistics List of statisticians Ocial statistics Multivariate analysis of variance

    11 References[1] Dodge, Y. (2006) The Oxford Dictionary of Statistical

    Terms, OUP. ISBN 0-19-920613-9

    [2] Lund Research Ltd. Descriptive and Inferential Statis-tics. statistics.laerd.com. Retrieved 2014-03-23.

    [3] Moses, Lincoln E. (1986) Think and Explain with Statis-tics, Addison-Wesley, ISBN 978-0-201-15619-5 . pp. 13

    [4] Hays, William Lee, (1973) Statistics for the Social Sci-ences, Holt, Rinehart andWinston, p.xii, ISBN 978-0-03-077945-9

    [5] Moore, David (1992). Teaching Statistics as a Re-spectable Subject. In F. Gordon and S. Gordon. Statis-tics for the Twenty-First Century. Washington, DC: TheMathematical Association of America. pp. 1425. ISBN978-0-88385-078-7.

    [6] Chance, Beth L.; Rossman, Allan J. (2005). Preface.Investigating Statistical Concepts, Applications, and Meth-ods (PDF). Duxbury Press. ISBN 978-0-495-05064-3.

    [7] Lakshmikantham,, ed. by D. Kannan,... V. (2002).Handbook of stochastic analysis and applications. NewYork: M. Dekker. ISBN 0824706609.

    [8] Schervish, Mark J. (1995). Theory of statistics (Corr. 2ndprint. ed.). New York: Springer. ISBN 0387945466.

    [9] Freedman, D.A. (2005) Statistical Models: Theory andPractice, Cambridge University Press. ISBN 978-0-521-67105-7

    [10] McCarney R, Warner J, Ilie S, van Haselen R, Grif-n M, Fisher P (2007). The Hawthorne Eect: a ran-domised, controlled trial. BMC Med Res Methodol 7(1): 30. doi:10.1186/1471-2288-7-30. PMC 1936999.PMID 17608932.

    [11] Mosteller, F., & Tukey, J. W. (1977). Data analysis andregression. Boston: Addison-Wesley.

    [12] Nelder, J. A. (1990). The knowledge needed to comput-erise the analysis and interpretation of statistical informa-tion. In Expert systems and articial intelligence: the needfor information about data. Library Association Report,London, March, 2327.

    [13] Chrisman, Nicholas R (1998). Rethinking Lev-els of Measurement for Cartography. Cartographyand Geographic Information Science 25 (4): 231242.doi:10.1559/152304098782383043.

    [14] van den Berg, G. (1991). Choosing an analysis method.Leiden: DSWO Press

    [15] Hand, D. J. (2004). Measurement theory and practice: Theworld through quantication. London, UK: Arnold.

    [16] Piazza Elio, Probabilit e Statistica, Esculapio 2007

    [17] Rubin, Donald B.; Little, Roderick J. A.,Statistical analy-sis with missing data, New York: Wiley 2002

    [18] Ioannidis, J. P. A. (2005). Why Most Published Re-search Findings Are False. PLoS Medicine 2 (8): e124.doi:10.1371/journal.pmed.0020124. PMC 1182327.PMID 16060722.

    [19] Hu, Darrell (1954) How to Lie with Statistics, WW Nor-ton & Company, Inc. New York, NY. ISBN 0-393-31072-8

    [20] Warne, R. Lazo; Ramos, T.; Ritter, N. (2012). Sta-tistical Methods Used in Gifted Education Journals,20062010. Gifted Child Quarterly 56 (3): 134149.doi:10.1177/0016986212444122.

    [21] Drennan, Robert D. (2008). Statistics in archaeology.In Pearsall, DeborahM. Encyclopedia of Archaeology. El-sevier Inc. pp. 20932100. ISBN 978-0-12-373962-9.

  • 12 11 REFERENCES

    [22] Cohen, Jerome B. (December 1938). Misuseof Statistics. Journal of the American Statis-tical Association (JSTOR) 33 (204): 657674.doi:10.1080/01621459.1938.10502344.

    [23] Freund, J. E. (1988). Modern Elementary Statistics.Credo Reference.

    [24] Hu, Darrell; Irving Geis (1954). How to Lie with Statis-tics. New York: Norton. The dependability of a samplecan be destroyed by [bias]... allow yourself some degreeof skepticism.

    [25] Willcox, Walter (1938) The Founder of Statistics. Re-view of the International Statistical Institute 5(4):321328.JSTOR 1400906

    [26] J. Franklin, The Science of Conjecture: Evidence andProbability before Pascal,Johns Hopkins Univ Pr 2002

    [27] Helen Mary Walker (1975). Studies in the history of sta-tistical method. Arno Press.

    [28] Galton, F (1877). Typical laws of heredity. Nature 15:492553. doi:10.1038/015492a0.

    [29] Stigler, S. M. (1989). Francis Galtons Account of theInvention of Correlation. Statistical Science 4 (2): 7379. doi:10.1214/ss/1177012580.

    [30] Pearson, K. (1900). On the Criterion that a given Sys-tem of Deviations from the Probable in the Case of aCorrelated System of Variables is such that it can bereasonably supposed to have arisen from Random Sam-pling. Philosophical Magazine Series 5 50 (302): 157175. doi:10.1080/14786440009463897.

    [31] Karl Pearson (18571936)". Department of StatisticalScience University College London.

    [32] Agresti, Alan; David B. Hichcock (2005). BayesianInference for Categorical Data Analysis (PDF).Statistical Methods & Applications 14 (14): 298.doi:10.1007/s10260-005-0121-y.

    [33] Neyman, J (1934). On the two dierent aspects of therepresentative method: The method of stratied sam-pling and the method of purposive selection. Journalof the Royal Statistical Society 97 (4): 557625. JSTOR2342192.

    [34] Science in a Complex World - Big Data: Opportunity orThreat?". Santa Fe Institute.

    [35] Anderson, D.R.; Sweeney, D.J.; Williams, T.A.. (1994)Introduction to Statistics: Concepts and Applications, pp.59. West Group. ISBN 978-0-314-03309-3

  • 13

    12 Text and image sources, contributors, and licenses12.1 Text

    Statistics Source: http://en.wikipedia.org/wiki/Statistics?oldid=660431408 Contributors: Brion VIBBER, Mav, The Anome, Tarquin,Stephen Gilbert, Ap, Larry Sanger, Eclecticology, Saikat, Youssefsan, Christian List, Enchanter, Miguel~enwiki, SimonP, Peterlin~enwiki,Ben-Zin~enwiki, Hefaistos, Waveguy, Heron, Rsabbatini, Camembert, Marekan, Olivier, Stevertigo, Edward, Boud, Michael Hardy,GABaker, Fred Bauder, Lexor, Nixdorf, Shyamal, Kku, Tannin, Dcljr, Tomi, CesarB, Looxix~enwiki, Ahoerstemeier, DavidWBrooks,Ronz, BevRowe, Snoyes, Salsa Shark, Netsnipe, Big iron, Jtzg, Cherkash, Samuel~enwiki, Mxn, Schneelocke, Hike395, Guaka, Van-ished user 5zariu3jisj0j4irj, Wikiborg, Dysprosia, Jitse Niesen, Quux, Jake Nelson, Maximus Rex, Wakka, Wernher, Optim, Rbellin,Secretlondon, Noeckel, Phil Boswell, Robbot, Jakohn, Benwing, ZimZalaBim, Gandalf61, Tim Ivorson, RossA, Henrygb, Hemanshu,Gidonb, Borislav, Ianml, Roozbeh, Dhodges, SoLando, Wile E. Heresiarch, Cutler, Dave6, Aomarks, Ancheta Wis, Matthew Stannard,Tophcito, Giftlite, Sj, Wikilibrarian, Netoholic, Lethe, Tom harrison, Meursault2004, Everyking, Maha ts, Curps, Dmb000006, Muz-zle, Jfdwol, BrendanH, Maarten van Vliet, Guanaco, Skagedal, Eequor, Mdb~enwiki, SWAdair, Brazuca, Hereticam, Andycjp, MatsKindahl, Antandrus, MarkSweep, Piotrus, Ampre, L353a1, Sean Heron, CSTAR, APH, Oneiros, Gsociology, PFHLai, Bodnotbod, My-sidia, Icairns, Simoneau, Sam Hocevar, Jeremykemp, Howardjp, Zondor, Bluemask, Drchris, Richardelainechambers, Moverton, Dis-cospinster, Rich Farmbrough, Guanabot, Michal Jurosz, IlyaHaykinson, Paul August, Bender235, Kbh3rd, Brian0918, El C, Lycurgus,Zenohockey, Art LaPella, RoyBoy, 2005, Bobo192, Janna Isabot, O18, Gianlu~enwiki, Smalljim, Maurreen, 3mta3, Minghong, Mdd,Passw0rd, Drf5n, Schissel, Jigen III, Msh210, Alansohn, Gary, Anthony Appleyard, Kanie, Rgclegg, Avenue, Evil Monkey, Oleg Alexan-drov, AustinZ, Waabu, Linas, Karnesky, LOL, Before My Ken, WadeSimMiser, Acerperi, Wikiklrsc, Sengkang, BlaiseFEgan, Gim-boid13, Mr Anthem, Marudubshinki, Graham87, Ilya, Galwhaa, Chun-hian, FreplySpang, Dragoneye776, Dpr, Tlroche, Jorunn, Koolkao,Rjwilmsi, Mayumashu, Amire80, Carbonite, Salix alba, Jb-adder, Willetjo, Crazynas, Jemcneill, Zero0w, FlaBot, Chocolatier, RobertG,Windchaser, Dibowen5, Latka, Mathbot, Airumel, Nivix, Celestianpower, RexNL, Gurch, AndriuZ, Pete.Hurd, Mathieumcguire, Shaile,Malhonen, BradBeattie, CiaPan, Chobot, Nagytibi, DVdm, Simesa, Adoniscik, Gwernol, Wavelength, Phantomsteve, Loom91, Cswrye,Epolk, Donwarnersaklad, Hydrargyrum, Stephenb, Manop, Chaos, NawlinWiki, Wiki alf, Grafen, Tailpig, ONEder Boy, TCrossland,Johndarrington, Isolani, D. Wu, Alex43223, BOT-Superzerocool, Mgnbar, Tigershrike, Saric, Closedmouth, Terfgiu, Modify, Beaker342,GraemeL, AGToth,Whouk, NeilN, DVDRW, Sardanaphalus, Veinor, JJL, SmackBot, YellowMonkey, Twerges, Unschool, Honza Zruba,Stux, Hydrogen Iodide, McGeddon, CommodiCast, Timotheus Canens, Dhochron, Gilliam, Brotherbobby, Skizzik, ERcheck, Chris thespeller, Bychan~enwiki, Bluebot, Keegan, Jjalexand, DocKrin, Wikisamh, Silly rabbit, Ekalin, RayAYang, Klnorman, Dlohcierekimssock, Robth, Zven, John Reaves, Scwlong, Chendy, SLC1, Iwaterpolo, PierreAnoid, Can't sleep, clown will eat me, DRahier, Asarko,Hve, Terry Oldberg, Addshore, Kcordina, Amazins490, Mosca, SundarBot, Jmlk17, Aldaron, ConMan, Valenciano, Krexer, Chadm-bol, Richard001, Nrcprm2026, Mini-Geek, G716, Photoleif, GumbyProf, Fschoonj, Wybot, Zeamays, SashatoBot, Lambiam, Arodb,Derek farn, Harryboyles, Chocolateluvr88, Sina2, Archimerged, Kuru, MagnaMopus, Lapaz, Soumyasch, Tim bates, SpyMagician, De-viathan~enwiki, Ckatz, RandomCritic, 16@r, Beetstra, Santa Sangre, Daphne A, Mets501, Spiel496, Ctacmo, RichardF, Roderickmunro,Hu12, Levineps, BranStark, Joseph Solis in Australia, Wjejskenewr, Mangesh.dashpute, Chris53516, Igoldste, Tawkerbot2, Daniel5127,Filelakeshoe, Kevin Murray, Kendroche, JForget, Robertdamron, CRGreathouse, CmdrObot, Dycedarg, Philiprbrenan, Dexter inside,Requestion, MarsRover, Neelix, Hingenivrutti, Nnp, Art10, MrFish, Myasuda, Mct mht, Slack---line, Mjhoy~enwiki, Arauzo, Ramitma-hajan, Gogo Dodo, Jkokavec, Anonymi, Bornsommer, Odie5533, Christian75, DumbBOT, Richard416282, Englishnerd, Optimist onthe run, Lindsay658, Finn krogstad, FrancoGG, Mattisse, Talgalili, [email protected], Epbr123, Jrl306, LeeG, Jsejcksn, Willwork-foricecream, N5iln, Marek69, John254, Escarbot, Dainis, Mentisto, Wikiwilly~enwiki, AntiVandalBot, Luna Santin, Seaphoto, Mem-set, Zappernapper, Mack2, Sbarnard, Gkhan, Golgofrinchian, MikeLynch, JAnDbot, Ldc, Markbold, The Transhumanist, Db099221,BenB4, PhilKnight, IamHope, SiobhanHansa, Magioladitis, Bongwarrior, VoABot II, Je Dahl, JamesBWatson, Hubbardaie, Ranger2006,Trugster, Skew-t, Recurring dreams, Ddr~enwiki, Caesarjbsquitti, Avicennasis, Nevvers, KConWiki, Catgut, Animum, Depressedrobot,Johnbibby, Robotman1974, Boob, Bobby H. Heey, Xerxes minor, JoergenB, DerHexer, JaGa, Khalid Mahmood, AllenDowney, Apde-vries, Pax:Vobiscum, Gjd001, Rustyfence, Cli smith, MartinBot, Vigyani, BetBot~enwiki, Jim.henderson, R'n'B, Lilac Soul, Mausy5043,J.delanoy, Trusilver, Rlsheehan, Numbo3, Mthibault, Ulyssesmsu, Yannick56, TheSeven, Cpiral, Gzkn, M C Y 1008, Luntertun, It Is MeHere, Noschool3, Macrolizard, Bmilicevic, HiLo48, The Transhumanist (AWB), Kenneth M Burke, DavidCBryant, Tiggerjay, Afv2006,HyDeckar, WinterSpw, Ron shelf, Tanyawade, Idioma-bot, Funandtrvl, Wikieditor06, Lights, VolkovBot, DrMicro, ABF, JohnBlack-burne, Paxcoder, Jimmaths, Barneca, Philip Trueman, DoorsAjar, TXiKiBoT, Ranajeet, Jacob Lundberg, Wikipediatoperfection, Tom-sega, ElinorD, Qxz, Arpabr, The Tetrast, Seanstock, Jackfork, Christopher Connor, Onore Baka Sama, Manik762007, Careercornerstone,Wikidan829, Richard redfern, Skarz, Dmcq, Symane, EmxBot, Kolmorogo, Demmy, Thefellswooper, SieBot, BotMultichill, Katonal,Triwbe, Toddst1, Flyer22, Tiptoety, JD554, Ireas, Jt512, Free Software Knight, Strife911, Oxymoron83, Faradayplank, Boromir123,Hinaaa, BenoniBot~enwiki, Emesee, OKBot, Msrasnw, Melcombe, Yhkhoo, Nn123645, Superbeecat, Digisus, Richard David Ram-sey, Escape Orbit, Maniac2910, Tautologist, XDanielx, WikipedianMarlith, ClueBot, Rumping, Fyyer, John ellenberger, DesertAngel,Gaia Octavia Agrippa, Giusippe, Turbojet, Uncle Milty, Niceguyedc, LizardJr8, Morten Mnchow, Chickenman78, Lbertolotti, Drag-onBot, Pumpmeup, Jusdafax, Three-quarter-ten, Rwilli13, Adamjslund, Livius3, Stathope17, Notteln, Precanalytics, Diaa abdelmoneim,Dekisugi, Gundersen53, BOTarate, Aitias, ShawnAGaddy, Dbenzvi, JDPhD, FinnMan, Qwfp, Antonwg, Ano-User, GKantaris, Edito-rofthewiki, Helixweb, XLinkBot, Avoided, WikHead, Alexius08, Tayste, Addbot, Proofreader77, Hgberman, DOI bot, Captain-tucker,Atethnekos, Johnjohn83, Kwanesum, Br1z, Bte99, CanadianLinuxUser, MrOllie, Chamal N, Glane23, Delaszk, Glass Sword, Debresser,Favonian, Quercus solaris, Aitambong, Ssschhh, Tide rolls, Lightbot, Kiril Simeonovski, Teles, MuZemike, TeH nOmInAtOr, LuK3,Megaman en m, Nbeltz, Jim, Luckas-bot, Yobot, Notizy1251, OrgasGirl, Fraggle81, Vimalp, DisillusionedBitterAndKnackered, Math-inik, Gobbleswoggler, THEN WHO WAS PHONE?, ECEstats, Brougham96, Mhmolitor, AnomieBOT, DemocraticLuntz, VX, Cavar-rone, Galoubet, Dwayne, Piano non troppo, Youkbam, Templatehater, Walter Grassroot, Htim, Materialscientist, The High Fin SpermWhale, Citation bot, OllieFury, Markmagdy, Sweeraha, GB fan, Apollo, Neurolysis, ArthurBot, Herreradavid33, LilHelpa, Xqbot, Tin-ucherianBot II, Class ruiner, Kenz0402, Drilnoth, Fishiface, Locos epraix, Spetzznaz, AbigailAbernathy, Clear range, Coretheapple, Grou-choBot, Ute in DC, SassoBot, Loizbec, 78.26, Rstatx, Stynyr, Doulos Christos, Chen-Pan Liao, N.j.hansen, Shadowjams, Joaquin008,Brennan41292, FrescoBot, Tobby72, Hallway916, Shadowpsi, HJ Mitchell, Winterswift, Citation bot 1, PrBeacon, Boxplot, Yuanfangde-lang, Pinethicket, Kiefer.Wolfowitz, Stpasha, Brian Everlasting, le ottante, Bwana2009, Dee539, Florendobe, White Shadows, Gamewiz-ard71, FoxBot, Mjs1991, Ruzihm, TobeBot, LAUD, Arfgab, Decstop, MrX, Spegali, Keepitup.sid, Sourishdas, Tbhotch, Drivi86, Sand-man888, DARTH SIDIOUS 2, Chrisrayner, Whisky drinker, Mean as custard, Updatehelper, TjBot, Kastchei, Karlheinz037, Becritical,Elitropia, Jordan.brayanov, EmausBot, OrphanWiki, Gfoley4, Racerx11, Hiamy, Tommy2010, Kellylautt, Tuxedo junction, Daonguyen95,F, Josve05a, Bollyje, Tastewrong1234, WeijiBaikeBianji, Cbratsas, JA(000)Davidson, Access Denied, Dylthaavatar, Kgwet, SporkBot,Jorjulio, GrindtXX, Makecat, Sak11sl, Future ahead, Anglais1, Sunur7, Mr. Kenan Bek, Noodleki, Donner60, Agatecat2700, NTox, De-

  • 14 12 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

    monicPartyHat, 28bot, Petrb, ClueBot NG, MelbourneStar, Chrisminter, Dvsbmx, BarrelProof, Bped1985, Andreas.Persson, Shawnluft,Cntras, Braincricket, ScottSteiner, Widr, Hikenstu, Ryan Vesey, Amircrypto, Helpful Pixie Bot, Xandrox, Mishnadar, Ldownss00, Cal-abe1992, KLBot2, Lowercase sigmabot, BG19bot, Juro2351, Northamerica1000, Absconded Northerner, Muhehej1000, MusikAnimal,Marcocapelle, Stalve, EmadIV, Rm1271, Htrkaya, Omiswiki, Manoguru, Kittipatv, Meclee, Brad7777, Glacialfox, Roleren, Anbu121, Eu-ropeancentralbank, Bsutradhar, Ca3tki, Kodiologist, Codeh, Gr khan veroana kharal, Markk waugh, Illia Connell, Dexbot, Ubertook,Mogism, Wikignome1213, Princessandthepi, Lugia2453, Brownstat, Norazoey, Speakel, 069952497a, Faizan, RG57, FallingGravity,AmericanLemming, Tentinator, Beasarah, Butter7938, Seppi333, SpuriousTwist, Ginsuloft, Sean4424, Sarwan khan, Adirlanz, AddWit-tyNameHere, Narasandraprabhakara, Science.philosophy.arts, Akuaku123, Mendisar Esarimar Desktrwaimar, Mconnolly17, Zib2542,Therealthings, MelaniePS, Monkbot, Poepkop, Soon Son Simps, Vieque, Majormuesli, Trackteur, Andri Kuawko, Romelthomas, Umkan,Ybergner, Amortias, NQ, Morgantaschuk, Sumonratin, Charlotte Aryanne, Thebearedguy, Mj3322, Rainamagdalena, Lucky457, John-Dae123, BabyChastie, Amira Swedan, Isambard Kingdom, All-wikipro, Asyraf Afthanorhan and Anonymous: 1206

    12.2 Images File:Blaise_Pascal_Versailles.JPG Source: http://upload.wikimedia.org/wikipedia/commons/9/98/Blaise_Pascal_Versailles.JPG Li-

    cense: CC BY 3.0 Contributors: Own work Original artist: unknown; a copy of the painture of Franois II Quesnel, which was madefor Grard Edelinck en 1691.

    File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Originalartist: ?

    File:Fisher_iris_versicolor_sepalwidth.svg Source: http://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_sepalwidth.svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (origi-nal); Pbroks13 (talk) (redraw)

    File:Folder_Hexagonal_Icon.svg Source: http://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by-sa-3.0 Contributors: ? Original artist: ?

    File:Gretl_screenshot.png Source: http://upload.wikimedia.org/wikipedia/commons/b/b9/Gretl_screenshot.png License: GPL Contrib-utors: ? Original artist: ?

    File:Linear_least_squares(2).svg Source: http://upload.wikimedia.org/wikipedia/commons/7/75/Linear_least_squares%282%29.svgLicense: CC BY-SA 3.0 Contributors: Own work Original artist: Sega sai

    File:Mw160883.jpg Source: http://upload.wikimedia.org/wikipedia/commons/1/15/Mw160883.jpg License: Public domain Contribu-tors: N.P.G.: http://www.npg.org.uk/collections/search/portrait/mw160883/Karl-Pearson?LinkID=mp100188&role=sit&rNo=4 Originalartist: Unknown

    File:NYW-confidence-interval.svg Source: http://upload.wikimedia.org/wikipedia/commons/8/8f/NYW-confidence-interval.svg Li-cense: Public domain Contributors: Own work Original artist: Tsyplakov

    File:Nuvola_apps_edu_mathematics_blue-p.svg Source: http://upload.wikimedia.org/wikipedia/commons/3/3e/Nuvola_apps_edu_mathematics_blue-p.svg License: GPL Contributors: Derivative work from Image:Nuvola apps edu mathematics.png and Image:Nuvolaapps edu mathematics-p.svg Original artist: David Vignoni (original icon); Flamurai (SVG convertion); bayo (color)

    File:P-value_in_statistical_significance_testing.svg Source: http://upload.wikimedia.org/wikipedia/commons/3/3a/P-value_in_statistical_significance_testing.svg License: CC BY-SA 3.0 Contributors: https://en.wikipedia.org/wiki/File:P_value.png Original artist:User:Repapetilto @ Wikipedia & User:Chen-Pan Liao @ Wikipedia

    File:People_icon.svg Source: http://upload.wikimedia.org/wikipedia/commons/3/37/People_icon.svg License: CC0 Contributors: Open-Clipart Original artist: OpenClipart

    File:Portal-puzzle.svg Source: http://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?Original artist: ?

    File:R._A._Fischer.jpg Source: http://upload.wikimedia.org/wikipedia/commons/4/46/R._A._Fischer.jpg License: Public domain Con-tributors: Transferred from en.wikipedia to Commons. Original artist: The original uploader was Bletchley at English Wikipedia

    File:Scatterplot.jpg Source: http://upload.wikimedia.org/wikipedia/commons/d/d8/Scatterplot.jpg License: CC-BY-SA-3.0 Contribu-tors: ? Original artist: ?

    File:Simple_Confounding_Case.svg Source: http://upload.wikimedia.org/wikipedia/commons/b/b8/Simple_Confounding_Case.svgLicense: CC BY-SA 3.0 Contributors: Own work Original artist:

    File:The_Normal_Distribution.svg Source: http://upload.wikimedia.org/wikipedia/commons/2/25/The_Normal_Distribution.svg Li-cense: Public domain Contributors: Transferred from en.wikipedia to Commons by Abdull. Original artist: Heds 1 at English Wikipedia

    File:Wikibooks-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/fa/Wikibooks-logo.svg License: CC BY-SA 3.0Contributors: Own work Original artist: User:Bastique, User:Ramac et al.

    File:Wikinews-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/2/24/Wikinews-logo.svg License: CC BY-SA 3.0Contributors: This is a cropped version of Image:Wikinews-logo-en.png. Original artist: Vectorized by Simon 01:05, 2 August 2006 (UTC)Updated by Time3000 17 April 2007 to use ocial Wikinews colours and appear correctly on dark backgrounds. Originally uploaded bySimon.

    File:Wikiquote-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/fa/Wikiquote-logo.svg License: Public domainContributors: ? Original artist: ?

    File:Wikisource-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/4/4c/Wikisource-logo.svg License: CC BY-SA 3.0Contributors: Rei-artur Original artist: Nicholas Moreau

    File:Wikiversity-logo-Snorky.svg Source: http://upload.wikimedia.org/wikipedia/commons/1/1b/Wikiversity-logo-en.svg License: CCBY-SA 3.0 Contributors: Own work Original artist: Snorky

    File:Wiktionary-logo-en.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/f8/Wiktionary-logo-en.svg License: Public do-main Contributors: Vector version of Image:Wiktionary-logo-en.png. Original artist: Vectorized by Fvasconcellos (talk contribs), basedon original logo tossed together by Brion Vibber

  • 12.3 Content license 15

    12.3 Content license Creative Commons Attribution-Share Alike 3.0

    Scope Mathematical statistics

    Overview Data collection SamplingExperimental and observational studiesExperimentsObservational study

    Types of data Terminology and theory of inferential statistics Statistics, estimators and pivotal quantitiesNull hypothesis and alternative hypothesisErrorInterval estimationSignificanceExamples

    Misuse of statistics Misinterpretation: correlation

    History of statistical science Applications Applied statistics, theoretical statistics and mathematical statisticsMachine learning and data miningStatistics in societyStatistical computingStatistics applied to mathematics or the arts

    Specialized disciplines See also References Text and image sources, contributors, and licensesTextImagesContent license