dissertation defense machine assisted grading of rare collectibles through the coins framework by:...

Dissertation Defense

Machine Assisted Grading of Rare Collectibles

through the COINS framework

By: Rick Bassett

July 24, 2003

Grading is…• Determining the condition (or grade) of an item. This is important as the grade is a major contributing factor in the

value of a collectible item.

• That many expert graders argue that grading is an art and not an exact science….While others take the opposite position.

The problem is…

General Research Framework

• This research takes the position that effective grading needs to be both an art and a science.

Science (objective) - represents the undistorted views of a grader, which are free of emotion or personal bias that are based on observable phenomena, appraisal and evidence without the distortion of personal feelings or interpretation.

Art (subjective) – is the judgment exercised by a grader that may be based on that individual’s personal impressions, visual observations, cognitive abilities, past experiences, feelings and opinions rather than objective external facts

Research Goals1. To develop an automated machine-based technical

grading system that produces results which are consistent and reliable enough to be used as a baseline grade for subjective grading by expert graders.

2. To determine if having humans applying their subjective grading opinions to the technical machine-based grade enhances the grading experience above that of human only or machine only.

COINS ModelComputer-based Objective Interactive Numismatic System • The the first component is a machine-based

system that performs objective grading and yields a technical grade as a result.

• The second component of the COINS model utilizes the output from the machine-based component as the baseline grade for subjective human/machine evaluation process.

COINS Model Purpose

• To provide grading experts with a reliable baseline machine generated technical grade as a starting point for grading so that they can then layer on their own subjective interpretations and ultimately arrive at a market grade. COINS is intended to enhance the ability of graders.

COINS Model – Part IObjective (Science) – based on hard facts

To explore the objective aspects of grading a machine-based grader was developed.

This automated grading allows us to submit a scanned GIF image for comparison against a trained database of image representations.

COINS Model – Part I Objective (Science) – based on hard facts

Using an algorithm known as Histogram Distance Measurement the system returns a result of the image in the database that most closely matches that of the image being evaluated.

Technical Grade: The machine-based system is objective consistent as it can focus on the hard facts when evaluating collectibles.

COINS Model – Part IISubjective (Art) – based on interpretation plus hard facts

To explore the subjective aspects of grading a browser-based grader was developed.

Market Grade: This browser-based grading tool allows expert human grading consultants to provide their subjective opinion of what the grade is.

Experimentation

Two series of experiments were developed to explore the research questions

1. Objective: Machine-based experiments

2. Subjective: Human-Machine based experiments

Machine Based Experiments• Hypothesis: The larger the trained database that the

machine was working from the better the machine would perform.

Perform 3 experiments (A,B & C) to test this hypothesis

Machine Based Experiments

Machine Reliability

0.000

0.010

0.020

0.030

0.040

0.050

0.060

0.070

55 65 85

# of Coins in database

Erro

r R

ate

(u

ne

xpla

ine

d v

aria

nce

)

As the number of images in the database increased in these 3 experiments the error rate decreased

Machine Based ExperimentsThe hypothesis was tested further in tests D & E

Test D – is the most exhaustive & most rigorous of the 5 tests

Compared to experiments A, B & C

Machine Based ExperimentsThe hypothesis was tested further in tests D & E

Test E – Also exhaustive, it compared the coins graded by the 3rd party grading services


Human-Machine Experiments Experts Acceptance Testing #1

• Hypothesis: The correlation will be lower than seen between the machine grades and those of the services on the twenty selected coins.


The substantive reason is that subjective factors are now brought into active consideration. The statistical reason is that with a greater number of human graders, there is bound to be those with idiosyncratic standards.

Human-Machine ExperimentsAdvance Grade Knowledge Testing #2

• Hypothesis: Online grading experts who believe that they are provided with a reliable machine-based grade will do better at grading than those graders that are not provided an accurate grade.

The substantive reason is that the expert graders should be able to use the grade provided as a reasonable starting baseline or starting point.

Advance Grade Knowledge

0.925

0.681

0.893

0.075

0.319

0.1070.0000.1000.2000.3000.4000.5000.6000.7000.8000.9001.000

MachineGrade

No Grade MisleadingGrade

Explained Variance

Error Rate

Lowest

Highest

Human-Machine ExperimentsThe Importance of Subjective Qualities Testing #3

• Hypothesis: Identifying which subjective qualities of importance that are used most by the expert graders will yield less than impressive statistical significance.

Expert graders often claim that grading is an art and not a science by referring to certain subjective qualities that are difficult to repeat and to measure on a repeatable basis.

The vast majority of graders failed to identify the incorporation of subjective qualities into their decision process

Human-Machine ExperimentsGrader Consensus Testing #4

• Hypothesis: Expert grading consultants will have widely diverse grading opinions on the images that they are grading.

Informal experience that humans grade inconsistently is supported by the Stujoe grading tests, Coin World tests, Kevin Foley tests on human grading.

Human-Machine ExperimentsDuplicate Grade Testing #5

• Hypothesis: Given that it was previously hypothesized that humans provide widely divergent grading opinions it is expected that graders will offer differing grading opinions on coins which they have previously graded.

The expectation was that there would be a certain amount of change that occurred in the grading of the duplicates but the hope was that the change, or variance, would not be significant.

Lower grade coins generally provide more challenges to graders

Human-Machine ExperimentsPerception of Internet based Grading Test #6

• Hypothesis: Expert graders would see the value in grading images over the Internet although they would rather grade from the actual coins.

The graders that participated in this test are predisposed to the use of technology as they are on the Internet.

dissertation defense machine assisted grading of rare collectibles through the coins framework by:...

Documents