statistics mb0040

Upload: navneet-singh

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Statistics Mb0040

    1/19

    MB-40

    Q-1Statistics is the backbone of descision-making: comment

    a) statistics is the backbone of decision-making

    Due to advanced communication network, rapid changes in consumer behaviour, varied expectations of variety of consumers and new market

    openings, modern managers have a difficult task of making quick and

    appropriate decisions. Therefore, there is a need for them to depend more upon quantitative techniques like mathematical models, statistics,

    operations research and econometrics.

    Decision making is a key part of our day-to-day life. Even when we wish to purchase a television, we like to know the price, quality, durability, and

    maintainability of various brands and models before buying one. As you can see, in this scenario we are collecting data and making an optimum

    decision. In other words, we are using Statistics.

    Again, suppose a company wishes to introduce a new product, it has to collect data on market potential, consumer likings, availability of raw materials,

    feasibility of producing the product. Hence, data collection is the back-bone of anydecision making process.

    Many organisations find themselves data-rich but poor in drawing information from it. Therefore, it is important to develop the ability to extract

    meaningful information from raw data to make better decisions. Statistics play an important role in this aspect.

    Statistics is broadly divided into two main categories. Below Figure illustrates the two categories. The two categories of Statistics are descriptive

    statistics and inferential statistics.

    Descriptive Statistics: Descriptive statistics is used to presentthe general description of data which is summarised quantitatively. This is mostly useful

    in clinical research, when communicating the results of experiments.

    Inferential Statistics: Inferential statistics is used to make valid inferences from the data which are helpful in effective decision making for managers or

    professionals.

    Statistical methods such as estimation, prediction and hypothesis testing belong to inferential statistics. The researchers make deductions or

    conclusions from the collected data samples regarding the characteristics of large population from which the samples are taken. So, we can say

    Statistics is the backbone of decision-making.

    B Statistics is as good as the user. Comment.

    Statistics is used for various purposes. It is used to simplify mass data and to make comparisons easier. It is also used to bring out trends and tendencies in the data as

    well as the hidden relations between variables. All this helps to make decision making much easier. Let us look at each function of Statistics in detail.

    1. Statistics simplifies mass data

    The use of statistical concepts helps in simplification of complex data. Using statistical concepts, the managers can make decisions more easily. The statistical

    methods help in reducing the complexity of the data and consequently in the understanding of any huge mass of data.

    2.Statistics makes comparison easier

    Without using statistical methods and concepts, collection of data and comparison cannot be done easily. Statistics helps us to compare data collected from different

    sources. Grand totals, measures of central tendency, measures of dispersion, graphs and diagrams, coefficient of correlation all provide ample scopes for comparison.

    3. Statistics brings out trends and tendencies in the data

    After data is collected, it is easy to analyse the trend and tendencies in the data by using the various concepts of Statist ics.

    4. Statistics brings out the hidden relations between variables

    Statistical analysis helps in drawing inferences on data. Statistical analysis brings out the hidden relations between variables.

  • 8/4/2019 Statistics Mb0040

    2/19

    5. Decision making power becomes easier

    With the proper application of Statistics and statistical software packages on the collected data, managers can take effective decisions, which can increase the profits

    in a business.

    Seeing all these functionality we can say Statistics is as good as the user.

    Q-2Distinguish between the following with example

    a) Inclusive and discrete dataClass intervals are of two types; exclusive and inclusive. The class interval that does not include upper class limit is called an exclusive type of class

    interval. The class interval that includes the upper class limit is called an inclusive type of class interval.

    Example:

    Inclusive series is the one which doesn't consider the upper limit, for example,

    00-10

    10-20

    20-30

    30-40

    40-50

    In the first one (00-10), we will consider numbers from 00 to 9.99 only. And 10 will be considered in 10-20. So this is known as inclusive series.

    Exclusive series is the one which has both the limits included, for example,

    00-09

    10-1920-29

    30-39

    40-49

    Here, both 00 and 09 will come under the first one (00-09). And 10 will come under the next one.

    b)Continous and discrete data

    Discrete data only take on particular values and no values in between. Data like the number of siblings a person has or the number of cars a person

    owns is discrete becuase you can either have 0 cars or 1 car or 2 cars and so on, but you can't own 1.5 cars.

    Continuous data can take on any value on a range. Temperature and height are continuous becuase you can be 69.32894... inches tall. You can be

    any fraction of an inch tall in that case.

    A type of data is discrete if there are only a finite number of values possible or if there is a space on thenumber line between each 2 possible values.

    Ex. A 5 question quiz is given in a Math class. The number of correct answers on a student's quiz is an

    example of discrete data. The number of correct answers would have to be one of the following : 0, 1, 2,

    3, 4, or 5. There are not an infinite number of values, therefore this data is discrete. Also, if we were to

    draw a number line and place each possible value on it, we would see a space between each pair of

    values.

    Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regarding differentlocations in the city. How many times it would take a person to pass this test is also an example of

    discrete data. A person could take it once, or twice, or 3 times, or 4 times, or . So, the possible valuesare 1, 2, 3, . There are infinitely many possible values, but if we were to put them on a number line, wewould see a space between each pair of values.

    Discrete data usually occurs in a case where there are only a certain number of values, or when we are

    counting something (using whole numbers).

  • 8/4/2019 Statistics Mb0040

    3/19

    Continuous data makes up the rest of numerical data. This is a type of data that is usually associated with

    some sort of physical measurement.

    Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree to be 76.2"

    tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha! The possibilities dependsupon the accuracy of our measuring device.

    One general way to tell if data is continuous is to ask yourself if it is possible for the data to take on

    values that are fractions or decimals. If your answer is yes, this is usually continuous data.

    Ex. The length of time it takes for a light bulb to burn out is an example of continuous data. Could it take

    800 hours? How about 800.7? 800.7354? The answer to all 3 is yes.

    b) Class limits and class intervals

    Qualitative data

    Qualitative data is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it

    is often used interchangeably with "categorical" data.

    For example: favorite color = "yellow"

    height = "tall"

    Although we may have categories, the categories may have a structure to them. When there is not a natural ordering of the categories, we call

    these nominal categories. Examples might be gender, race, religion, or sport.

    When the categories may be ordered, these are called ordinal variables. Categorical variablesthat judge size (small, medium, large, etc.) are ordinal

    variables. Attitudes (strongly disagree, disagree, neutral, agree, st rongly agree) are also ordinal variables, however we may not know which value is

    the best or worst of these issues. Note that the distance between these categories is not something we can measure.

    Quantitative data

    Quantitative data is a numerical measurement expressed not by means of a natural language description, but rather in terms of numbers. However, not

    all numbers are continuous and measurable. For example, the social security number is a number, but not something that one can add or subtract.

    For example: favorite color = "450 nm"

    height = "1.8 m"

    Quantitative data always are associated with a scale measure.

    Probably the most common scale type is the ratio-scale. Observations of this type are on a scale that has a meaningful zero value but also have an

    equidistant measure (i.e., the difference between 10 and 20 is the same as the difference between 100 and 110). For example, a 10 year-old girl is

  • 8/4/2019 Statistics Mb0040

    4/19

    twice as old as a 5 year-old girl. Since you can measure zero years, time is a ratio-scale variable. Money is another common ratio-scale quantitative

    measure. Observations that you count are usually ratio-scale (e.g., number of widgets).

    A more general quantitative measure is the interval scale. Interval scales also have a equidistant measure. However, the doubling principle breaks

    down in this scale. A temperature of 50 degrees Celsius is not "half as hot" as a temperature of 100, but a difference of 10 degrees indicates the same

    difference in temperature anywhere along the scale. The Kelvin temperature scale, however, constitutes a ratio scale because on the Kelvin scale zero

    indicates absolute zero in temperature, the complete absence of heat. So one can say, for example, that 200 degrees Kelvin is twice as hot as 100

    degrees Kelvin.

    d)

    Class Limits

    Class limits are the smallest and largest observations (data, events etc) in each class. Therefore, each class has two limits: a lower and upper.

    Example:

    Class Frequency

    200299 12300399 19

    400499 6

    500599 2

    600699 11

    700799 7

    800899 3

    Total Frequency 60

    Using the frequency table above, what are the lower and upper class limits for the first three classes?

    For the first class, 200 299

    The lower class limit is 200

    The upper class limit is 299

    For the second class, 300 399

    The lower class limit is 300

    The upper class limit is 399

    For the third class, 400 499

    The lower class limit is 400

    The upper class limit is 499

    Class Intervals

    Class interval is the difference between the upper and lower class boundaries of any class.

    Example:

    Class Frequency

    200299 12

    300399 19

    400499 6

    500599 2

    600699 11

    700799 7

  • 8/4/2019 Statistics Mb0040

    5/19

    800899 3

    Total Frequency 60

    Using the table above, determine the class intervals for the first class.

    For the first class, 200 299

    The class interval = Upper class boundary lower class boundary

    Upper class boundary = 299.5

    Lower class boundary = 199.5

    Therefore, the class interval = 299.5 199.5

    = 100

    Q-4 List down varios measures of cebtral tendency and explain the different between them?

    Measures of Central Tendency

    Introduction

    A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As

    such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often

    called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as, the median and the

    mode.

    The mean, median and mode are all valid measures of central tendency but, under different conditions, some measures of central tendency become

    more appropriate to use than others. In the following sections we will look at the mean, mode and median and learn how to calculate them and under

    what conditions they are most appropriate to be used.

    Mean (Arithmetic)

    The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data,

    although its use is most often with continuous data (see ourTypes of Variableguide for data types). The mean is equal to the sum of all the values in

    the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, then the sample

    mean, usually denoted by (pronounced x bar), is:

    This formula is usually written in a slightly different manner using the Greek capitol letter, , pronounced "sigma", which means "sum of...":

    You may have noticed that the above formula refers to the sample mean. So, why call have we called it a sample mean? This is because, in statistics,

    samples and populations have very different meanings and these differences are very important, even if, in the case of the mean, they are calculated

    in the same way. To acknowledge that we are calculating the population mean and not the sample mean, we use the Greek lower case letter "mu",

    denoted as :

    The mean is essentially a model of your data set. It is the value that is most common. You will notice, however, that the mean is not often one of the

    actual values that you have observed in your data set. However, one of its important properties is that it minimises error in the prediction of any one

    value in your data set. That is, it is the value that produces the lowest amount of error from all other values in the data set.

    An important property of the mean is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure

    of central tendency where the sum of the deviations of each value from the mean is always zero.

    When not to use the mean

    http://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.php
  • 8/4/2019 Statistics Mb0040

    6/19

    The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest

    of the data set by being especially small or large in numerical value. For example, consider the wages of staff at a factory below:

    Staff 1 2 3 4 5 6 7 8 9 10

    Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

    The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value might not be the best way to accurately

    reflect the typical salary of a worker, as most workers have salaries in the $12k to 18k range. The mean is being skewed by t he two large salaries.

    Therefore, in this situation we would like to have a better measure of central tendency. As we will find out later, taking the median would be a better

    measure of central tendency in this situation.

    Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i.e. the frequency distribution for our data is

    skewed). If we consider the normal distribution - as this is the most frequently assessed in statistics - when the data is perfectly normal then the

    mean, median and mode are identical. Moreover, they all represent the most typical value in the data set. However, as the data becomes skewed the

    mean loses its ability to provide the best central location for the data as the skewed data is dragging it away from the typical value. However, the

    median best retains this position and is not as strongly influenced by the skewed values. This is explained in more detail in the skewed distribution

    section later in this guide.

    Median

    The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed

    data. In order to calculate the median, suppose we have the data below:

    65 55 89 56 35 14 56 55 87 45 92

    We first need to rearrange that data into order of magnitude (smallest first):

    14 35 45 55 55 56 56 65 87 89 92

    Our median mark is the middle mark - in this case 56 (highlighted in bold). It is the middle mark because there are 5 scores before it and 5 scores

    after it. This works fine when you have an odd number of scores but what happens when you have an even number of scores? What if you had only

    10 scores? Well, you simply have to take the middle two scores and average the result. So, if we look at the example below:

    65 55 89 56 35 14 56 55 87 45

    We again rearrange that data into order of magnitude (smallest first):

    14 35 45 55 55 56 56 65 87 89 92

    Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55.5.

    Mode

    The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore,

    sometimes consider the mode as being the most popular option. An example of a mode is presented below:

  • 8/4/2019 Statistics Mb0040

    7/19

    Normally, the mode is used for categorical data where we wish to know which is the most common category as illustrated below:

  • 8/4/2019 Statistics Mb0040

    8/19

    We can see above that the most common form of transport, in this particular data set, is the bus. However, one of the problems with the mode is that

    it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency, such as below:

  • 8/4/2019 Statistics Mb0040

    9/19

    We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data, as

    we are more likely not to have any one value that is more frequent than the other. For example, consider measuring 30 peoples' weight (to the

    nearest 0.1 kg). How likely is it that we will find two or more people with exactlythe same weight, e.g. 67.4 kg? The answer, is probably very unlikely

    - many people might be close but with such a small sample (30 people) and a large range of possible weights you are unlikely to find two people with

    exactly the same weight, that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data.Another problem with the mode is that it will not provide us with a very good measure of central tendency when the most common mark is far away

    from the rest of the data in the data set, as depicted in the diagram below:

  • 8/4/2019 Statistics Mb0040

    10/19

    In the above diagram the mode has a value o f 2. We can clearly see, however, that the mode is not representative of the data, which is mostly

    concentrated around the 20 to 30 value range. To use the mode to describe the central tendency of this data set would be misleading.

    Skewed Distributions and the Mean and Median

    We often test whether our data is normally distributed as this is a common assumption underlying many statistical tests. An example of a normally

    distributed set of data is presented below:

  • 8/4/2019 Statistics Mb0040

    11/19

    When you have a normally distributed sample you can legitimately use both the mean or the median as your measure of central tendency. In fact, in

    any symmetrical distribution the mean, median and mode are equal. However, in this situation, the mean is widely preferred as the best measure of

    central tendency as it is the measure that includes all the values in the data set for its calculation, and any change in any of the scores will affect the

    value of the mean. This is not the case with the median or mode.

    However, when our data is skewed, for example, as with the right-skewed data set below:

  • 8/4/2019 Statistics Mb0040

    12/19

    we find that the mean is being dragged in the direct of the skew. In these situations, the median is generally considered to be the best representative

    of the central location of the data. The more skewed the distribution the greater the difference between the median and mean, and the greater

    emphasis should be placed on using the median as opposed to the mean. A classic example of the above right-skewed distribution is income (salary),

    where higher-earners provide a false representation of the typical income if expressed as a mean and not a median.

    If dealing with a normal distribution, and tests of normality show that the data is non-normal, then it is customary to use the median instead of themean. This is more a rule of thumb than a strict guideline however. Sometimes, researchers wish to report the mean of a skewed distribution if the

    median and mean are not appreciably different (a subjective assessment) and if it allows easier comparisons to previous research to be made.

    Summary of when to use the mean, median and mode

    Please use the following summary table to know what the best measure of central tendency is with respect to the differenttypes of variable.

    Type of Variable Best measure of central tendency

    Nominal Mode

    Ordinal Median

    Interval/Ratio (not skewed) Mean

    Interval/Ratio (skewed) Median

    Q-5 Define population and sampling unit for selecting a random sample in each of the following cases.

    a)Hundred voters from a constituency

    b) Twenty stocks of National stock Exchange

    c) Fifty account holders of state Bank of India

    http://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.php
  • 8/4/2019 Statistics Mb0040

    13/19

  • 8/4/2019 Statistics Mb0040

    14/19

    Instatistics, a confidence interval (CI) is a particular kind ofinterval estimateof apopulation parameterand is used to indicate the reliability of an

    estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the

    parameter of interest, if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence

    level or confidence coefficient.

    A confidence interval with a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taken over all the

    data that mighthave been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the

    parameter the proportion of the time set by the confidence level.[clarification needed]

    More specifically, the meaning of the term "confidence level" is that, if

    confidence intervals are constructed across many separate data analyses of repeated (and possibly different) experiments, the proportion of such

    intervals that contain the true value of the parameter will approximately match the confidence level; this is guaranteed by the reasoning underlying the

    construction of confidence intervals.

    A confidence interval does notpredict that the true value of the parameter has a particular probability of being in the confidence interval given the data

    actually obtained. (An interval intended to have such a property, called acredible interval, can be estimated usingBayesianmethods; but such methods

    bring with them their own distinct strengths and weaknesses).

    The purpose ofconfidence of intervalsis to determine a series of values from recurring samples of data so that the series of values of the specific

    population parameter is more likely to happen within the specified probability. Heres my Statistics for Dummies interpretation and example.

    Lets say that thepopulation parameterof matter is the population average and that the series of values has an 80% confidence interval. The

    confidence of interval is not a probability that there is 80% possibility of the confidence interval being the population average, the confidence interval is

    the 80% of when sampled data from the specific range of the population parameter happens again and again from the population, thus, the percentage

    of these intervals will have the population average.

    Also, another purpose of use and why would be the amount of data that the provider believes as factual with a high degree of Confidence; that is more

    certain about a part of the data than perhaps some of the secondary data gathered. Confidence Intervals can be the expected range of outcome.

    Null Hypothesis and Confidence Intervals

    Confidence intervals are used to reject anull hypothesis. If I set my confidence level for my test at 80%, I have a 20% chance of being wrong about the

    null hypothesis. Of course I can't completely reject the possibility of being wrong. Toss a penny 100 times and it's a 50/50 chance that it's going to

    come up heads. The actual results may vary by five one way or the other, but still lie within the parameters. Confidence interval lets me predict how

    close to 50/50 the results are going to be and how often.

    Confidence Intervals: The Skinny

    Confidence Intervals measure the probability of something likely to occur within a population based on the values or data gathered from repeated

    testing of that specific population.

    For example, as in a weather prediction, if a certain weather condition presents itself and can illustrate to produce a Thunderstorm, then the confidence

    interval would be significant that a storm will occur.

    Another Example: Surfing the Wave

    In Hawaii, Surfing is a popular sport. The most most most important factor for a good day of surf is the size of a swell (wave height), the average sizes

    of swells throughout a given timeframe, and the consistency (ride length and wave direction) of a swell. How are Confidence Intervals used in surfing to

    determine wave height? Confidence Intervals measure the probability of how high and what wave direction, waves will travel at a given timeframe. By

    testing and gathering the data and values that a range of waves provides, including atmospheric and climate conditions, the values of these wave

    intervals will average a numbered wave height, providing a probability that during the same timeframe of the next day, a similar average wave height

    will occur.

    http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://www.stat.yale.edu/Courses/1997-98/101/confint.htmhttp://www.stat.yale.edu/Courses/1997-98/101/confint.htmhttp://www.stat.yale.edu/Courses/1997-98/101/confint.htmhttp://www.stat.sfu.ca/~cschwarz/Stat-301/Handouts/node32.htmlhttp://www.stat.sfu.ca/~cschwarz/Stat-301/Handouts/node32.htmlhttp://www.stat.sfu.ca/~cschwarz/Stat-301/Handouts/node32.htmlhttp://hubpages.com/_xrs10stock/hub/Null-and-Alternative-Hypothesis--Hypothesis-Testing-An-Examplehttp://hubpages.com/_xrs10stock/hub/Null-and-Alternative-Hypothesis--Hypothesis-Testing-An-Examplehttp://hubpages.com/_xrs10stock/hub/Null-and-Alternative-Hypothesis--Hypothesis-Testing-An-Examplehttp://hubpages.com/_xrs10stock/hub/Null-and-Alternative-Hypothesis--Hypothesis-Testing-An-Examplehttp://www.stat.sfu.ca/~cschwarz/Stat-301/Handouts/node32.htmlhttp://www.stat.yale.edu/Courses/1997-98/101/confint.htmhttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Statistics
  • 8/4/2019 Statistics Mb0040

    15/19

    The confidence level tells you how sure you can be. It is expressed as a percentage andrepresents how often the true percentage of the population who would pick an answer lies withinthe confidence interval. The 95% confidence level means you can be 95% certain; the 99%confidence level means you can be 99% certain. Most researchers use the 95% confidencelevel.

    Q-6What is a confidence interval, and why it is useful? What is s confidence level?

    Confidence interval

    From Wikipedia, the free encyclopedia

    This article is about the confidence interval. For Confidence distribution, seeConfidence Distribution.

    Instatistics, a confidence interval (CI) is a particular kind ofinterval estimateof apopulation parameterand is used to indicate the reliability of an

    estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the

    parameter of interest, if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence

    level or confidence coefficient.

    A confidence interval with a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taken over all the

    data that mighthave been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the

    parameter the proportion of the time set by the confidence level.[clarification needed]

    More specifically, the meaning of the term "confidence level" is that, if

    confidence intervals are constructed across many separate data analyses of repeated (and possibly different) experiments, the proportion of such

    intervals that contain the true value of the parameter will approximately match the confidence level; this is guaranteed by the reasoning underlying the

    construction of confidence intervals.

    A confidence interval does notpredict that the true value of the parameter has a particular probability of being in the confidence interval given the data

    actually obtained. (An interval intended to have such a property, called acredible interval, can be estimated usingBayesianmethods; but such methods

    bring with them their own distinct strengths and weaknesses).

    In this bar chart, the top ends of t he bars indicate observation means and the red line segments represent the confidence intervals surrounding them

    [edit]Introduction

    Interval estimatescan be contrasted withpoint estimates. A point estimate is a single value given as the estimate of a population parameter that is of

    interest, for example the mean of some quantity. An interval estimate specifies instead a range within which the parameter is estimated to lie.

    http://en.wikipedia.org/wiki/Confidence_Distributionhttp://en.wikipedia.org/wiki/Confidence_Distributionhttp://en.wikipedia.org/wiki/Confidence_Distributionhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://en.wikipedia.org/wiki/Bar_charthttp://en.wikipedia.org/wiki/Bar_charthttp://en.wikipedia.org/wiki/Bar_charthttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Line_segmenthttp://en.wikipedia.org/wiki/Line_segmenthttp://en.wikipedia.org/wiki/Line_segmenthttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=2http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=2http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=2http://en.wikipedia.org/wiki/Interval_estimatehttp://en.wikipedia.org/wiki/Interval_estimatehttp://en.wikipedia.org/wiki/Point_estimatehttp://en.wikipedia.org/wiki/Point_estimatehttp://en.wikipedia.org/wiki/Point_estimatehttp://en.wikipedia.org/wiki/File:Confidenceinterval.pnghttp://en.wikipedia.org/wiki/File:Confidenceinterval.pnghttp://en.wikipedia.org/wiki/File:Confidenceinterval.pnghttp://en.wikipedia.org/wiki/File:Confidenceinterval.pnghttp://en.wikipedia.org/wiki/Point_estimatehttp://en.wikipedia.org/wiki/Interval_estimatehttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=2http://en.wikipedia.org/wiki/Line_segmenthttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Bar_charthttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Confidence_Distribution
  • 8/4/2019 Statistics Mb0040

    16/19

    Confidence intervals are commonly reported in tables or graphs along with point estimates of the same parameters, to show the reliability of the

    estimates.

    For example, a confidence interval can be used to describe how reliable survey results are. In a poll of election voting-intentions, the result might be

    that 40% of respondents intend to vote for a certain party. A 90% confidence interval for the proportion in the whole population having the same

    intention on the survey date might be 38% to 42%. From the same data one may calculate a 95% confidence interval, which might in this case be 36%

    to 44%. A major factor determining the length of a confidence interval is thesize of the sampleused in the estimation procedure, for example the

    number of people taking part in a survey.

    [edit]Relationship with other statistical topics

    [edit]Statistical hypothesis testing

    Confidence intervals are closely related to statisticalsignificance testing. For example, if for some estimated parameter one wants to test thenull

    hypothesisthat = 0 against the alternative that 0, then this test can be performed by determining whether the confidence interval for contains 0.

    More generally, given the availability of a hypothesis testing procedure that can test the null hypothesis = 0 against the alternative that0 for any

    value of 0, then a confidence interval with confidence level = 1 can be defined as containing any number 0 for which the corresponding null

    hypothesis is not rejected at significance level .[1]

    In consequence,[clarification needed]

    if the estimates of two parameters (for example, the mean values of a variable in two independent groups of objects)

    have confidence intervals at a given value that do not overlap, then the difference between the two values is significantat the corresponding value

    of . However, this test is too conservative. If two confidence intervals overlap, the difference between the two means still may be significantly

    different.[2][3]

    [edit]Confidence region

    Confidence regionsgeneralize the confidence interval concept to deal with multiple quantities. Such regions can indicate not only the extent of

    likelysampling errorsbut can also reveal whether (for example) it is the case that if the estimate for one quantity is unreliable then the other is also

    likely to be unreliable. See alsoconfidence bands.

    In applied practice, confidence intervals are typically stated at the 95% confidence level.[4]

    However, when presented graphically, confidence intervals

    can be shown at several confidence levels, for example 50%, 95% and 99%.

    [edit]Statistical theory

    [edit]Definition

    Let Xbe arandom sample from aprobability distributionwithparameters, which is a quantity to be estimated, and , representing quantities not of

    immediate interest. A confidence intervalfor the parameter , withconfidence levelorconfidence coefficient, is an interval with random

    endpoints , determined by the pair of statistics (i.e., observablerandom variables)u(X) and v(X), with the property:

    http://en.wikipedia.org/wiki/Sample_sizehttp://en.wikipedia.org/wiki/Sample_sizehttp://en.wikipedia.org/wiki/Sample_sizehttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=3http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=3http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=3http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=4http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=4http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=4http://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttp://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttp://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttp://en.wikipedia.org/wiki/Null_hypothesishttp://en.wikipedia.org/wiki/Null_hypothesishttp://en.wikipedia.org/wiki/Null_hypothesishttp://en.wikipedia.org/wiki/Null_hypothesishttp://en.wikipedia.org/wiki/Confidence_interval#cite_note-CH7-0http://en.wikipedia.org/wiki/Confidence_interval#cite_note-CH7-0http://en.wikipedia.org/wiki/Confidence_interval#cite_note-CH7-0http://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Statistical_significancehttp://en.wikipedia.org/wiki/Statistical_significancehttp://en.wikipedia.org/wiki/Statistical_significancehttp://en.wikipedia.org/wiki/Confidence_interval#cite_note-gh95-1http://en.wikipedia.org/wiki/Confidence_interval#cite_note-gh95-1http://en.wikipedia.org/wiki/Confidence_interval#cite_note-gh95-1http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=5http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=5http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=5http://en.wikipedia.org/wiki/Confidence_regionhttp://en.wikipedia.org/wiki/Confidence_regionhttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Confidence_bandhttp://en.wikipedia.org/wiki/Confidence_bandhttp://en.wikipedia.org/wiki/Confidence_bandhttp://en.wikipedia.org/wiki/Confidence_interval#cite_note-3http://en.wikipedia.org/wiki/Confidence_interval#cite_note-3http://en.wikipedia.org/wiki/Confidence_interval#cite_note-3http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=6http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=6http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=6http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=7http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=7http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=7http://en.wikipedia.org/wiki/Random_samplehttp://en.wikipedia.org/wiki/Random_samplehttp://en.wikipedia.org/wiki/Random_samplehttp://en.wikipedia.org/wiki/Probability_distributionhttp://en.wikipedia.org/wiki/Probability_distributionhttp://en.wikipedia.org/wiki/Probability_distributionhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Confidence_levelhttp://en.wikipedia.org/wiki/Confidence_levelhttp://en.wikipedia.org/wiki/Confidence_levelhttp://en.wikipedia.org/wiki/Confidence_coefficienthttp://en.wikipedia.org/wiki/Confidence_coefficienthttp://en.wikipedia.org/wiki/Confidence_coefficienthttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Random_variablehttp://en.wikipedia.org/wiki/Confidence_coefficienthttp://en.wikipedia.org/wiki/Confidence_levelhttp://en.wikipedia.org/wiki/Parameterhttp://en.wikipedia.org/wiki/Probability_distributionhttp://en.wikipedia.org/wiki/Random_samplehttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=7http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=6http://en.wikipedia.org/wiki/Confidence_interval#cite_note-3http://en.wikipedia.org/wiki/Confidence_bandhttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Confidence_regionhttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=5http://en.wikipedia.org/wiki/Confidence_interval#cite_note-gh95-1http://en.wikipedia.org/wiki/Confidence_interval#cite_note-gh95-1http://en.wikipedia.org/wiki/Statistical_significancehttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Confidence_interval#cite_note-CH7-0http://en.wikipedia.org/wiki/Null_hypothesishttp://en.wikipedia.org/wiki/Null_hypothesishttp://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=4http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=3http://en.wikipedia.org/wiki/Sample_size
  • 8/4/2019 Statistics Mb0040

    17/19

    The quantities in which there is no immediate interest are callednuisance parameters, as statistical theory still needs to find some way to deal

    with them. The number , with typical values close to but not greater than 1, is sometimes given in the form 1 (or as a percentage

    100%(1 )), where is a small nonnegative number, close to 0.

    Here Pr, is used to indicate the probability when the random variable Xhas the distribution characterised by (, ). An important part of this

    specification is that the random interval (U, V) covers the unknown value with a high probability no matter what the true value of actually is.

    Note that here Pr, need not refer to an explicitly given parameterised family of distributions, although it often does. Just as the random

    variable Xnotionally corresponds to other possible realizations of xfrom the same population or from the same version of reality, the parameters

    (, ) indicate that we need to consider other versions of reality in which the distribution of Xmight have different characteristics.

    In a specific situation, when xis the outcome of the sample X, the interval (u(x),v(x)) is also referred to as a confidence interval for . Note that it

    is no longer possible to say that the (observed) interval (u(x),v(x)) has probability to contain the parameter . This observed interval is just one

    realization of all possible intervals for which the probability statement holds.

    [edit]Intervals for random outcomes

    Confidence intervals can be defined for random quantities as well as for fixed quantities as in the above. Seeprediction interval. For this,

    consider an additional single-valued random variable Ywhich may or may not be statistically dependent on X. Then the rule for constructing the

    interval (u(x), v(x)) provides a confidence interval for the as-yet-to-be observed value yof Yif

    Here Pr, is used to indicate the probability over the joint distribution of the random variables (X, Y) when this is characterised by

    parameters (, ).

    [edit]Approximate confidence intervals

    For non-standard applications it is sometimes not possible to find rules for constructing confidence intervals that have exactly the required

    properties. But practically useful intervals can still be found. The coverage probability c(, ) for a random interval is defined by

    and the rule for constructing the interval may be accepted as providing a confidence interval if

    to an acceptable level of approximation.

    [edit]Comparison to Bayesian interval estimates

    A Bayesian interval estimate is called a credible interval. Using much of the same notation as above, the definition of a

    credible interval for the unknown true value of is, for a given ,[5]

    http://en.wikipedia.org/wiki/Nuisance_parameterhttp://en.wikipedia.org/wiki/Nuisance_parameterhttp://en.wikipedia.org/wiki/Nuisance_parameterhttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=8http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=8http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=8http://en.wikipedia.org/wiki/Prediction_intervalhttp://en.wikipedia.org/wiki/Prediction_intervalhttp://en.wikipedia.org/wiki/Prediction_intervalhttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=9http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=9http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=9http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=10http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=10http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=10http://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/wiki/Confidence_interval#cite_note-4http://en.wikipedia.org/wiki/Confidence_interval#cite_note-4http://en.wikipedia.org/wiki/Confidence_interval#cite_note-4http://en.wikipedia.org/wiki/Confidence_interval#cite_note-4http://en.wikipedia.org/wiki/Credible_intervalhttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=10http://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=9http://en.wikipedia.org/wiki/Prediction_intervalhttp://en.wikipedia.org/w/index.php?title=Confidence_interval&action=edit&section=8http://en.wikipedia.org/wiki/Nuisance_parameter
  • 8/4/2019 Statistics Mb0040

    18/19

    Validity. This means that the nominal coverage probability (confidence level) of the confidence

    interval should hold, either exactly or to a good approximation.

    Optimality. This means that the rule for constructing the confidence interval should make as much

    use of the information in the data-set as possible. Recall that one could throw away half of a

    dataset and still be able to derive a valid confidence interval. One way of assessing optimality is by

    the length of the interval, so that a rule for constructing a confidence interval is judged better than

    another if it leads to intervals whose lengths are typically shorter.

    Invariance. In many applications the quantity being estimated might not be tightly defined as such.

    For example, a survey might result in an estimate of the median income in a population,

    but it might equally be considered as providing an estimate of the logarithm of

    the median income, given that this is a common scale for presenting graphical results. It would

    be desirable that the method used for constructing a confidence interval for the median income

    would give equivalent results when applied to constructing a confidence interval for the logarithm of

    the median income: specifically the values at the ends of the latter interval would be the logarithms

    of the values at the ends of former interval.

    he confidence interval is the plus-or-minus figure usually reported in newspaper ortelevision opinion poll results. For example, if you use a confidence interval of 4 and47% percent of your sample picks an answer you can be "sure" that if you had asked thequestion of the entire relevant population between 43% (47-4) and 51% (47+4) wouldhave picked that answer.

    The confidence level tells you how sure you can be. It is expressed as a percentageand represents how often the true percentage of the population who would pick ananswer lies within the confidence interval. The 95% confidence level means you can be95% certain; the 99% confidence level means you can be 99% certain. Most researchersuse the 95% confidence level.

    When you put the confidence level and the confidence interval together, you can saythat you are 95% sure that the true percentage of the population is between 43% and51%.

    The wider the confidence interval you are willing to accept, the more certain you can bethat the whole population answers would be within that range. For example, if you asked

    a sample of 1000 people in a city which brand of cola they preferred, and 60% saidBrand A, you can be very certain that between 40 and 80% of all the people in the cityactually do prefer that brand, but you cannot be so sure that between 59 and 61% of thepeople in the city prefer the brand.

    Factors that Affect Confidence IntervalsThere are three factors that determine the size of the confidence interval for agiven confidence level. These are: sample size, percentageand population size.

  • 8/4/2019 Statistics Mb0040

    19/19

    Sample SizeThe larger your sample, the more sure you can be that their answers truly reflect thepopulation. This indicates that for a given confidence level, the larger your sample size,the smaller your confidence interval. However, the relationship is not linear (i.e.,doubling the sample size does not halve the confidence interval).

    PercentageYour accuracy also depends on the percentage of your sample that picks a particularanswer. If 99% of your sample said "Yes" and 1% said "No" the chances of error areremote, irrespective of sample size. However, if the percentages are 51% and 49% thechances of error are much greater. It is easier to be sure of extreme answers than ofmiddle-of-the-road ones.

    When determining the sample size needed for a given level of accuracy you must usethe worst case percentage (50%). You should also use this percentage if you want todetermine a general level of accuracy for a sample you already have. To determine theconfidence interval for a specific answer your sample has given, you can use thepercentage picking that answer and get a smaller interval.

    Population SizeHow many people are there in the group your sample represents? This may be thenumber of people in a city you are studying, the number of people who buy new cars,etc. Often you may not know the exact population size. This is not a problem. Themathematics of probability proves the size of the population is irrelevant, unless the sizeof the sample exceeds a few percent of the total population you are examining. Thismeans that a sample of 500 people is equally useful in examining the opinions of a stateof 15,000,000 as it would a city of 100,000. For this reason, thesample calculatorignoresthe population size when it is "large" or unknown. Population size is only likely to be afactor when you work with a relatively small and known group of people .

    Note:The confidence interval calculations assume you have a genuinerandom sampleof therelevant population. If your sample is not truly random, you cannot rely on the intervals.Non-random samples usually result from some flaw in the sampling procedure. Anexample of such a flaw is to only call people during the day, and miss almost

    http://www.gifted.uconn.edu/siegle/research/Samples/samplecalculator.htmhttp://www.gifted.uconn.edu/siegle/research/Samples/samplecalculator.htmhttp://www.gifted.uconn.edu/siegle/research/Samples/simplerandom.htmhttp://www.gifted.uconn.edu/siegle/research/Samples/simplerandom.htmhttp://www.gifted.uconn.edu/siegle/research/Samples/simplerandom.htmhttp://www.gifted.uconn.edu/siegle/research/Samples/simplerandom.htmhttp://www.gifted.uconn.edu/siegle/research/Samples/samplecalculator.htm