astm international - a statistics q a what are repeatability ......20 astmstandardizationnewso...

3
18 ASTM STANDARDIZATION NEWS o MAY/JUNE 2009 www.astm.org Q: Many ASTM standard test methods contain repeatability and reproducibility statements and values. What variations can be expected? What is the standard deviation for the repeatability and reproducibility of the method? A: The interlaboratory study is the first step to obtaining contrasting values for the repeatabil- ity and reproducibility of a test method. It is a way to learn how well or poorly a test method behaves when performed in different situations. ASTM standard E691, Practice for Conduct- ing an Interlaboratory Study to Determine the Precision of a Test Method, is the basic standard describing how to perform an ILS and obtain val- ues for the variations that one might expect for tests done at typical laboratories. As described in the preceding article of this series (see page 16 of the March/April SN), taking some number of repeats within each laboratory in a very short time by the same operator and equipment yields a best case situation, which should be the small- est variation among readings. This becomes a measure of the repeatability of measurements and is represented by calculating the repeatabil- ity standard deviation. Usually only a small number of repeats for each test protocol in every laboratory is needed. By averaging results over many laboratories we learn how a typical lab might perform. Of course, all of this depends on how many laboratories participated and how well they represent the real world of laboratories. Thus, if you only have 10 laboratories involved, especially if they have been the ones developing the standard, it may be questionable to assume that any other random laboratory would perform as well. This could certainly be the case for new methods. When we perform the test method in many different laboratories on the same material, we hope to discover all of the potential variations that can occur when the test method is used. Because we now have different operators, dif- ferent equipment and different environmental conditions, all of the intermediate conditions and more will have been introduced. Thus, we should expect to have greater variability among the results from different laboratories. The measure of this larger variation due to readings taken among laboratories is found as the reproducibil- ity standard deviation. Again, the problem of interpreting this variation is that it is dependent on how many laboratories have participated. When only a very small sample of all the labs that might run the method is used, you must be cautious in believ- ing that these results are typical of what all labo- ratories might have done with the same test. In addition, it is also important to look at the results to see if some laboratories consistently perform differently. Often the major reason for variation among laboratories is a consequence of some type of bias, or systematic difference, that occurs for one or more of the labs. This is especially a problem to recognize when only a very small number of labs actually participated (say, fewer than 10). If we use the terms repeatability and repro- ducibility as describing the nature of the varia- tion, then that variation is best computed as a standard deviation. Let’s take a closer look at how these terms are defined in an ILS. When the terms r and R were first presented, the idea was to provide a simple approximate comparison for a very special case of using the results of the ILS. The value of r, called the repeatability interval , is found by simply multiply- ing the repeatability standard deviation by 2.8; it is similar to the statistical estimate of a 95 percent confidence interval for the difference What Are Repeatability and Reproducibility? Part 2: The E11 Viewpoint 1 BY NEIL ULLMAN data points a statistics q &a

Upload: others

Post on 26-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 18 a s t m S T A N D A R D I Z A T I O N N e w s o m a y / J u N e 2 0 0 9 w w w . a s t m . o r g

    Q: Many ASTM standard test methods contain repeatability and reproducibility statements and values. What variations can be expected? What is the standard deviation for the repeatability and reproducibility of the method?

    A: The interlaboratory study is the first step to obtaining contrasting values for the repeatabil-ity and reproducibility of a test method. It is a way to learn how well or poorly a test method behaves when performed in different situations.

    ASTM standard E691, Practice for Conduct-ing an Interlaboratory Study to Determine the Precision of a Test Method, is the basic standard describing how to perform an ILS and obtain val-ues for the variations that one might expect for tests done at typical laboratories. As described in the preceding article of this series (see page 16 of the March/April SN), taking some number of repeats within each laboratory in a very short time by the same operator and equipment yields a best case situation, which should be the small-est variation among readings. This becomes a measure of the repeatability of measurements and is represented by calculating the repeatabil-ity standard deviation.

    Usually only a small number of repeats for each test protocol in every laboratory is needed. By averaging results over many laboratories we learn how a typical lab might perform. Of course, all of this depends on how many laboratories participated and how well they represent the real world of laboratories. Thus, if you only have 10 laboratories involved, especially if they have been the ones developing the standard, it may be questionable to assume that any other random laboratory would perform as well. This could certainly be the case for new methods.

    When we perform the test method in many

    different laboratories on the same material, we hope to discover all of the potential variations that can occur when the test method is used. Because we now have different operators, dif-ferent equipment and different environmental conditions, all of the intermediate conditions and more will have been introduced. Thus, we should expect to have greater variability among the results from different laboratories. The measure of this larger variation due to readings taken among laboratories is found as the reproducibil-ity standard deviation.

    Again, the problem of interpreting this variation is that it is dependent on how many laboratories have participated. When only a very small sample of all the labs that might run the method is used, you must be cautious in believ-ing that these results are typical of what all labo-ratories might have done with the same test. In addition, it is also important to look at the results to see if some laboratories consistently perform differently. Often the major reason for variation among laboratories is a consequence of some type of bias, or systematic difference, that occurs for one or more of the labs. This is especially a problem to recognize when only a very small number of labs actually participated (say, fewer than 10).

    If we use the terms repeatability and repro-ducibility as describing the nature of the varia-tion, then that variation is best computed as a standard deviation. Let’s take a closer look at how these terms are defined in an ILS.

    When the terms r and R were first presented, the idea was to provide a simple approximate comparison for a very special case of using the results of the ILS. The value of r, called the repeatability interval, is found by simply multiply-ing the repeatability standard deviation by 2.8; it is similar to the statistical estimate of a 95 percent confidence interval for the difference

    What Are Repeatability and Reproducibility?Part 2: The E11 Viewpoint1b y N e i l U l l m a N

    d a t a p o i n t s a statist ics q&a

  • 18 a s t m S T A N D A R D I Z A T I O N N e w s o m a y / J u N e 2 0 0 9 w w w . a s t m . o r g w w w . a s t m . o r g m a y / J u N e 2 0 0 9 o a s t m S T A N D A R D I Z A T I O N N e w s 19

    between two readings. So, by using r we reduce the statistical jargon. The same goes for R, which is the reproducibility standard deviation times 2.8. With these calculations we arrive at a reproduc-ibility interval, which we then use to compare the difference among a pair of actual test results that we might observe from two labs.

    These interval types assume the following:The data are normally distributed; that is, kthe laboratories chosen are randomly picked from a distribution of labs that have approxi-mately the same variation as the ILS results. We are only comparing two independent kreadings; that is, either two repeats by a single operator or single tests conducted in only two laboratories.That when we see a difference greater than kthe interval, there is a 5 percent chance that it is actually due to random chance, not an actual difference in the testing process. That the actual tests we look at fall in the krange of those performed in the ILS.A few comments should also be noted: As mentioned earlier, serious biases may kbe involved in tests performed in the ILS, especially when only a small number of labs were used, which calls into question how heavily one should rely on the estimate for a particular sample.E691 currently describes the case where two ksamples are taken. The International Organiza-tion for Standardization’s ILS standard, ISO 5725, Accuracy (Trueness and Precision) of Measure-ment Methods and Results, suggests ways to compare different size samples either for repeat-ability or reproducibility based on modifying the multiplier of the standard deviation.In addition, if you want to reduce the chance kof making the error of declaring a difference in results from the 5 percent value to, say only 1 percent, then the multiplier would need

    to change from 2.8 to about 3.6. Even more important is the repeated use of the

    comparison of samples. For example, if you make many paired comparisons, the chance that one would randomly be different rapidly increases.

    EXAMPLE — E691 SEruM in gLucoSE StudyFor a serum in glucose study, eight laboratories testing five different reference samples of blood, ranging from low sugar levels to very high, measured each material three times. In the table we have the key summary statistics (Table 11 in E691). The first column contains the sample material averages, followed by the repeatability standard deviation, sr, and then the reproducibil-ity standard deviation, sR. In all but the first case, sR is larger than sr. Recall that r is 2.8 times sr, and R is 2.8 times sR.

    To interpret these values of standard devia-tion, if we had a reading of about 135 (material C), and we had a single operator in one labora-

    tory run many tests on that material, then 95 percent of the readings would fall within a range of approximately +3.0 units (or about 1.96 times the repeatability standard deviations or a total

    Material Sr SR r R

    A 41.5183 1.0632 1.0632 2.98 2.98

    B 79.6796 1.4949 1.5796 4.19 4.42

    C 134.7264 1.5434 2.1482 4.33 6.02

    D 194.7170 2.6251 3.3657 7.35 9.42

    E 294.4920 3.9350 4.1923 11.02 11.74

  • 20 a s t m S T A N D A R D I Z A T I O N N e w s o m a y / J u N e 2 0 0 9 w w w . a s t m . o r g

    d a t a p o i n t s a statist ics q&a

    range of about 6.0 units). But if only two read-ings were run at random, then 95 percent of the time the difference between those two readings should not be more than 4.33 units (the value of r for material C). Similarly, if many laboratories ran a single test then 95 percent of the single readings would fall in a range of about 8.6 units, but pairs of readings would rarely have a differ-ence of greater than 6.02 units.

    Next Issue: Part 3 — Repeatability and reproducibility in measurement systems analysis, or “gage R&R” methodology.

    RefeRence

    1. ASTM Committee E11 on Quality and Statistics

    neil Ullman is a retired professor of mathematics and me-chanical technology and a past chair of Committee E11. Currently chair of Subcommittee E11.20 on Test Method Evaluation and Quality Control, he is a fellow of ASTM and the American Society for Quality. He recently received the E11 Harold F. Dodge Award.

    Dean neUbaUeR is the DataPoints column coordinator and E11.90.03 publications chair.

    Statistics play an important role in the ASTM International standards you write, such as the development of precision and bias statements for test methods, running interlaboratory studies, knowing how to round numbers properly and determining sample size. A panel of experts is ready to answer your questions about how to use statistical principles in ASTM standards. Please send your questions to SN Editor in Chief Maryann Gorman at [email protected] or ASTM International, 100 Barr Harbor Drive, P.O. Box C700, W. Conshohocken, PA 19428-2959.

    Bonus Q&A

    Q: Regarding the use of the intermediate precision condition as shown in the march/april DataPoints column: is there a manipulation of one of these calculations to be used when one is looking for limits relating to the intermediate precision condition? — Jeff monson

    A. The short answer is yes. Intermediate precision (R’) is defined in ASTM standard D6299, Practice for Applying Statistical Quality Assurance and Control Charting Techniques to Evalu-ate Analytical Measurement System Performance, as 2.77 times the standard deviation of a lab’s quality control test results obtained under site precision conditions. A typical applica-tion of R’ would be to determine if the retained material from a particular batch of released product has degraded over time. For that application, a lab will test the retained material and compare the test result with the original test result. If the difference exceeds R’, then there is strong evidence that the retained material has degraded, subject to the assumption that the test method is in statistical control when the original and retain test results are obtained, and the QC material that R’ is based on is compositionally similar to the retain.

    — Alex Lau

    New Edition!ASTM Standards inBUILDING CODES 46th Edition

    Available in Print, DVD, and ONLINE! Get the tools you need to design and construct buildings that satisfy

    international code requirements established by the International Code

    Council (ICC).

    Over 1,300 ASTM construction specifi cations, practices, and test

    methods, compiled from the Annual Book of ASTM Standards, enable

    you to:

    Provides easy access to the LATEST VERSIONS of the ASTM standards referenced by:

    AVAILABLE IN 3 FORMATS1. ONLINE SUBSCRIPTIONS:Online Basic: Individual: $1,350; Stock # BLDGSVC-SN2-5 users (single site): $3,500; Stock #: BLDGSVCX-SN 6-10 users (single site): $4,375; Stock #: BLDGSVCXL-SNOnline Plus: Individual: $1,620; Stock # BLDGSVCR-SN2-5 users (single site): $4,200; Stock #: BLDGSVCXR-SN6-10 users (single site): $5,250; Stock #: BLDGSVCXLR-SN

    2. DVD:Individual WorkstationBasic: $1,090; Stock #: BLDGCD-SNPlus: $1,310; Stock #: BLDGCDR-SNIndividual Workstation with Quarterly UpdatesBasic: $1,375; Stock #: BLDGCDUPD-SNPlus: $1,650; Stock #: BLDGCDRUPD-SN

    3. PRINTED BOOKS:4-Volume Set; 8,760 Pages; 1,300+ StandardsSoft Cover: 8.5” x 11” Price: $975ISBN: 978-0-8031-6827-5; Stock #: BLDG09-SN

    For networking prices contact George Zajdel at [email protected] or 610-832-9614.

    Meet International Code •

    Requirements

    Stay Informed and Remain •

    Competitive

    Specify the Right Material for •

    the Job

    Understand the Signifi cance & •

    Use of the Standards

    Speak a Common Language •

    that the Entire Industry

    Recognizes

    The International Codes •

    published by ICC

    Canadian National Building •

    Code/Canadian National Fire

    Code

    Uniform Plumbing Code and •

    Uniform Mechanical Code

    MASTERSPEC® •

    BSD SpecLink® •

    NFPA5000™—Building •

    Construction and Safety

    Code™

    SPECTEXT® Master Guide •

    Specifi cations

    TO ORDER: Visit: www.astm.org • Phone: 610-832-9585 • Fax: 610-832-9555 • Email: [email protected]

    mj 09 18mj09 dp 19mj09 dp 20