roi & impact - quantitative & qualitative measures for taxonomies
DESCRIPTION
A presentation by Dr. Jay Ven Eman, CEO of Access Innovations, Inc., on measuring the financial benefits of taxonomies. First presented at the 2009 Data Harmony Users Group meeting.TRANSCRIPT
![Page 1: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/1.jpg)
ROI & Impact: Quantitative & Qualitative Measures for Taxonomies
Wednesday, 11 February 2009
12:00 – 12:30 PM MST
Presented by Jay Ven Eman, Ph.D., CEO
Access Innovations, Inc. / Data Harmony
505.998.0800 / www.accessinn.com / www.dataharmony.com
DHUG 2009
![Page 2: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/2.jpg)
First, some questions Do you know what a taxonomy is? Does your boss’s boss know? Care? What are YOU trying to accomplish? What are your objectives? What isn’t working? What is?
How badly? How much? Who? Where?
Copyright 2007 Access Innovations, Inc.
![Page 3: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/3.jpg)
First, some questions - 2 Who are your searchers?
Internal? Intranet? External? Web? Fee based (commercial)? How many? What do they do? How do they do it?
What are they seeking? Why?
Copyright 2007 Access Innovations, Inc.
![Page 4: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/4.jpg)
First, some questions - 3 Where are they looking? How many searching environments?
Physical? Internal resources? External resources? Search interfaces?
And so on…
Copyright 2007 Access Innovations, Inc.
![Page 5: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/5.jpg)
Copyright 2007 Access Innovations, Inc.
“Meaning” starts with a knowledge organization system (KOS)
Uncontrolled list Name authority file Synonym set/ring Controlled vocabulary Taxonomy Thesaurus
Not complex - $
Highly complex - $$$$
LOTS OF OVERLAP!
Topic MapOntologySKOS
![Page 6: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/6.jpg)
The Pain of Search
Copyright 2007 Access Innovations, Inc.
The Pain of Search
PercentNumber of Employees
Search & Use Timel Per Week
Time Searching Per Week
Time Analysing Per Week
Average Loaded Salary
Annual Cost of Looking
Search Time Reduction Difference
Mission critical 1000 Hours Hours Hours
$ Per Hour 10%
High 10 100 14 8.4 5.6 200 8,736,000 7,862,400 873,600
Medium 80 800 12 7.2 4.8 150 44,928,000 40,435,200 4,492,800
Low 10 100 10 6 4 100 3,120,000 2,808,000 312,000
$56,784,000 $51,105,600 $5,678,400
![Page 7: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/7.jpg)
ROI - Segments Cost of taxonomy system Indexing costs Cost of getting system ready Ongoing maintenance Increased efficiency Increased quality of retrieval Cost of legacy system maintenance
![Page 8: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/8.jpg)
Copyright © 2005 Access Innovations, Inc.
Taxonomy construction
Process Terms/hr # of terms
Cost/hr Cost
From scratch 4 5000 $75 $93,750
License 0 - 100K
License & customize 6 5000 75 62,500+ 5,000
Auto-generate/cleanup + tool
6 5000 75 62,500+ 100,000
Mapping 8 5000 75 46,875
![Page 9: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/9.jpg)
Indexing & Search Metrics Hit, Miss, Noise Subjective
Relevance Aboutness
Statistical Precision Recall Level of effort
![Page 10: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/10.jpg)
Hit, Miss, Noise Hit – exactly what a human indexer would use Miss – human indexer would use but system
did not assign Noise – system assigned but human did not
Relevant noise – could have been assigned Irrelevant noise – just plain wrong
![Page 11: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/11.jpg)
Subjective Relevance
Reflects how akin it is to the users request Aboutness
Reflects the topical match between the document content and the term
How well the topic describes what the document is about
Varies with level of conceptual terms vs. factual terms in the thesaurus
![Page 12: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/12.jpg)
Subjective “There is now a 92% accuracy rating accuracy on accounting and
regulatory document search based on hit, miss and noise or relevance, precision and recall statistics…Access Innovations.” USGAO
“IEEE had their system up and running in three days, in full production in less than two weeks.” Institute of Electrical and Electronics Engineers (IEEE)
“The American Economic Association said its editors think using it is fun and makes time fly!” American Economic Association (AEA)
“ ProQuest CSA have achieved a 7 fold increase in productivity – thus they have four licenses.” ProQuest CSA
“Weather Channel finds things 50% faster using Data Harmony. A significant saving in time.” The Weather Channel
![Page 13: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/13.jpg)
Statistical Precision
Correct retrieval / Total retrieval Hits / hits + noise
Recall Correct retrieval / Total correct in system Hits / Hits + misses
Level of effort Hits / Hits + misses + noise
![Page 14: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/14.jpg)
Cost Goals Cost Savings
Software/hardware More efficient delivery systems Retirement of legacy systems
Cost Avoidance Additional staff not needed to scale Lower training costs
![Page 15: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/15.jpg)
Productivity Goals Productivity gains
Employee productivity – fourfold Get up to speed faster Learn vocabulary faster Able to capture peoples knowledge in the
rule base Staff savings / redeployment Elimination of new hires
![Page 16: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/16.jpg)
Additional Benefits Revenue Generation
Higher hit rates More purchases off the site
Competitive advantage Shorter product / sales cycles Faster implementation Better search experience
Ability to meet regulatory requirements
![Page 17: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/17.jpg)
Go – No Go Reach 85% precision to launch for
productivity - assisted Reach 85% for filtering or categorization
Sorting for production Level of effort to get to 85% Integration into the workflow is efficient
![Page 18: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/18.jpg)
Benchmarks 15 – 20% irrelevant returns / noise Amount of work needed to achieve 85%
level How good is good enough?
Satisfice = satisfaction + suffice How much error can you put up with?
![Page 19: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/19.jpg)
Example ROI Calculation Assume – 5,000 term thesaurus
1.5 synonyms per terms 7,500 terms total
Assume 85% accuracy Use assisted for indexing Use automatically for filtering
Assume $75 per hour for staff Assume 10,000 records for test batch
![Page 20: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/20.jpg)
Indexing costs with Data Harmony 80% of rules built automatically 7,500 x .8 = 6,000 20% require complex rules
Average rule takes 5 minutes (Actually MUCH faster using M.A.I. GUI) 5 x 1,500 = 7,500 minutes 125 hours x $75 = $9,375
![Page 21: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/21.jpg)
Indexing Costs Base cost of MAIstro EE - $60,000 Cost of getting system ready
Programming support and integration Estimated at 2 weeks programming $125 / hour = $10,000
Rule building Estimated at 125 hours $75 / hour = $9,375 Possible need to re-run training set several times
Ongoing maintenance Estimated at 15% of purchase price for license = $9,000 Rule building for new terms 50 terms per quarter
200 terms x .8 = 160 automatic 40 at 5 minutes per term = 200 minutes /60 = 3.33 hours x $75 =
$250 Targeted initial accuracy at 85%
![Page 22: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/22.jpg)
Indexing costs Year one
$60,000 + $10,000 + $9,375 = $79,375 Years thereafter
9000 + 250 = $9250 85% accuracy
![Page 23: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/23.jpg)
ROI Taxonomy costs = $67,500 Indexing costs = $79,375 Pain of search – difference = $5,678,400 If off by factor of 4, then a positive ROI of
241%
Copyright 2007 Access Innovations, Inc.
![Page 24: ROI & Impact - Quantitative & Qualitative Measures for Taxonomies](https://reader036.vdocument.in/reader036/viewer/2022062513/555067adb4c90574428b56be/html5/thumbnails/24.jpg)
ROI & Impact: Quantitative & Qualitative Measures for Taxonomies
Wednesday, 11 February 2009
12:00 – 12:30 PM MST
Presented by Jay Ven Eman, Ph.D., CEO
Access Innovations, Inc. / Data Harmony
505.998.0800 / www.accessinn.com / www.dataharmony.com
Thank you!