dark data in the long tail of science: examples in biology
DESCRIPTION
Presentation on challenges in aquiring, indexing and disseminating scholarly data.TRANSCRIPT
![Page 1: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/1.jpg)
Dark Data In the Long Tail of Science:
Examples in BiologySeptember 2, 2009
National Institute of Standards and Technology
P. Bryan HeidornNSF
University of Illinois University of Arizona
![Page 2: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/2.jpg)
Introduction
Program Manager, Division of Biological Infrastructure, National Science Foundation
Associate Professor, Graduate School of Library and Information Science, University of Illinois
Director School of Information Resources and Library Science, University of Arizona
JRS Biodiversity Foundation Board of Directors
![Page 3: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/3.jpg)
Cyberinfrastructure Vision
“The anticipated growth in both the production and repurposing of digital data raises complex issues not only of scale and heterogeneity, but also of stewardship, curation and long-term access.”
NSF Cyberinfrastructure Vision for 21st Century Discovery, Chapter 3
![Page 4: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/4.jpg)
Recognition of need for data curation
“Recommendation 6: The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.”
Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, Recommendations
![Page 5: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/5.jpg)
Recognition of the importance of Information
Recognition of the need for education
New work roles within traditional institutions
Interagency Working Group on Digital Data
![Page 6: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/6.jpg)
New Information Disciplines
Digital Curator: an expert knowledgeable of and with responsibility for the content of a digital collection(s)
Digital Archivist: an expert competent to appraise, acquire, authenticate, preserve, and provide access to records in digital form
Data Scientists: the information and computer scientists, database and software engineers and programmers, disciplinary experts, expert annotators, and others, who are crucial to the successful management of a digital data collection
(Long Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, report of the National Science Board, September, 2005)
![Page 7: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/7.jpg)
Library Skills
![Page 8: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/8.jpg)
Economics of the long tail
The Long Tail, By Chris Anderson. Wired Magizine.12.10, 2004. (http://www.wired.com/wired/archive/12.10/tail_pr.html)
NetFlix versus BlockBuster
Genbank versus Joe’s Lab
Big Science versus New Science
![Page 9: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/9.jpg)
Naive View of Science Data
f(x)=axk+o(xk)
Power Law of Science Data
f(x)=axk+o(xk)| X<.20
Dat
a V
olum
e
Science Projects and Initiatives
![Page 10: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/10.jpg)
Does NSF’s Data Follow the Power Law?
I do not know but if $1 = X bytes…..
Awarded Amount 2007
$0
$1,000,000
$2,000,000
$3,000,000
$4,000,000
$5,000,000
$6,000,000
$7,000,000
1 586 1171 1756 2341 2926 3511 4096 4681 5266 5851 6436 7021 7606 8191 8776
![Page 11: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/11.jpg)
20-80 Rule The small are big!
Total Grants 9347
$2,137,636,716
20% 80%
Number Grants 1869 7478
Total Dollars $1,199,088,125 $938,548,595
Range $6,892,810-$350,000
$350,000-$831
![Page 12: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/12.jpg)
Dark data is the data that we know is/was there but we can’t see it.
Hubble Space Telescope composite image "ring" of dark matter in the galaxy cluster Cl 0024+17
![Page 13: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/13.jpg)
Related Ideas
John Porter: Deep verses Wide databases
Swanson: Undiscovered Public Knowledge
Science Commons: Big Verses Small science
![Page 14: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/14.jpg)
Why is the tail also important
Valuable science data is in the tail Many scientists could use the tail data
•Unpublished observations of flowing time in Concord by Alfred Hosmer from 1888 to 1902•Photographs of Flowers•Blue Hill Observatory meteorological dataRichard B. Primack, Abraham J. Miller-Rushing, Daniel Primack, and Sharda Mukunda (2007). Using Photographs to Show the Effects of Climate Change on Flowing Time. Arnoldia 65(1), p2-9.
Valuable science data is in the tail Many scientists could use the tail data Science innovation occurs in the long tail Unpublished negative results / aka dark data We know very little about the tail Transformative science happens in the tail Computational thinking needed to free the tail NSF Current investments in the tail OECD Principles and Guidelines for Access to
Research Data from Public Funding
![Page 15: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/15.jpg)
Technical Solutions: Move the tail to the head (increase k)
Data standards e.g. Environmental Markup Language (EML)e.g. TaxonX - taXMLit
Metadata Darwin Core (DwC)Access to Biological Collection Data (ABCD)
ProtocolsTAPIR
![Page 16: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/16.jpg)
Solutions
Controlled Vocabularies MeSH, ZooBank, IPNI, ITIS
Ontologies Gene Ontology (GO) Science Environment for Ecological Knowledge (SEEK) EcoGrid Leopold Semi-Automated ontology generation for
Amphibian Morphology DBI-0640053 (Semantic) web software DataNet
![Page 17: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/17.jpg)
Institutional Solutions
Well Paid LibrariansWell-heeled MuseumsProfessional SocietiesGenerous PublishersLibrary director John Hanson told the
Associated Press that a couple of dozen people are cited each year for failure to return materials or pay fines. The incident cost Dalibor about $30 for the two overdue paperbacks. It cost her mother $172 to free her.
Book and Bake Sale at the Mary E. Tippitt Memorial Library in Townsend.
Sailing Yacht Maltese Falco owned by Tom Perkins
![Page 18: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/18.jpg)
Organizational Solutions
LTER, NEON, GBIF, TDWG National Center for Ecological Analysis and
Synthesis (NCEAS) National Evolutionary Synthesis Center
(NESCent) European Union Networks of Excellence (NoE) European Distributed Institute of Taxonomy
(EDIT) Digital Curation Centre (UK)
![Page 19: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/19.jpg)
Questions about the long-tail
How long is the tail? What is the area under the tail? How steep is the back of science data? How valuable could the tail be? What is different between tail-science and head-
science? What is the differential distribution of sciences?
![Page 20: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/20.jpg)
Barriers
Lack of professional reward structure Lack of education in data curation Intellectual property rights (IPR) Lack of technology Lack of financial reward structure Under valuation / lack of investment Cost of infrastructure creation Cost of infrastructure maintenance PDF, excel, MS word, arcview, floppy disks
![Page 21: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/21.jpg)
My Solutions
Research HERBIS Biogeomancer Next - Biodiversity Retrieval Evaluation Conference (BREC)
Education Biological Informatics Masters Data Curation
Service JRS Biodiversity Foundation National Science Foundation Taxonomic Database Working Group
![Page 22: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/22.jpg)
Automatic Metadata Extraction (Darwin Core) From Museum
Specimen Labels2008 Dublin Core Conference
P. Bryan Heidorn, Qin Wei
University of Illinois at Urbana-Champaign
…<co> Curtis, </co><hdlc> North American Pl</hdlc><cnl> No.</cnl><cn> 503*</cn><gn> Polygala</gn><sp> ambigua,</sp><sa> Nutt.,</sa><val> var.</val><hb> Coral soil,</hb><lc> Cudjoe Key, South Florida.</lc><col> Legit</col><co> A. H. Curtiss.</co><dt>February</dt>…
![Page 23: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/23.jpg)
The problem
>1 Billion Natural History Specimens Collected over 250 years / many languages No publishing standards Near infinite classes
Your high school teacher lied 6 min / label * 1B labels = 100M hours Saving 1 min = 16.7 Million hours $10/hr = $167,000,000 1/4790 of U.S. deregulation financial bailout
![Page 24: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/24.jpg)
Why care about the specimens?
Largest extinction in Cretaceous periodRapid Environmental Change
![Page 25: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/25.jpg)
http://www.ncdc.noaa.gov/img/climate/globalwarming/ar4-fig-3-9.gif
![Page 26: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/26.jpg)
Why care
Largest mass extinction in millions of years Rapid Environmental Change Historic distribution of species Ecological niche modeling (invasiveness, crop
hardiness, pest potential) Projections of the impact of climate change Where did Herbert Lang and James Chapin go on the
Congo Expedition? ( 1909-1915) Will I see a Kirkland Warbler here? Are some potato species resistant to potato blight? When did Linden trees bloom before the industrial
revolution?
![Page 27: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/27.jpg)
A real-life example: Baronia brevicornis and its single food plant, Acacia cochliacantha (Soberon)
![Page 28: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/28.jpg)
B. brevicornis Abiotic Niche using BS Garp
![Page 29: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/29.jpg)
Natural History Specimens
![Page 30: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/30.jpg)
Sample records
![Page 31: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/31.jpg)
Sample OCR Output
Yale University Herbarium
~r-^""" r-n-------
YU.001300
Curtisb, North American Pl
C^o.nr r^-n
ANTS,
No. 503* "^
Polygala ambigna, Nntt., var.
Coral soil, Cudjoe Key, South Florida.
Legit A. H. Curtiss.
![Page 32: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/32.jpg)
Label Labels
bc - barcodebt - barcode textcm - common/colloquial namecn - collection numberco - collectorcd - collection datefm - family nameft - footer info
![Page 33: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/33.jpg)
Label Labels
gn - genus name hd - header infoin - infra nameina - infra name authorlc - location pd - plant descriptionsa - scientific name authorsp - species name
![Page 34: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/34.jpg)
Example Training Record
<?xml version="1.0" encoding="UTF-8"?><?oxygen
RNGSchema="http://www3.isrl.uiuc.edu/~TeleNature/Herbis/semanticrelax.rng" type="xml"?>
<labeldata><bt>Yale University Herbarium</bt><ns> ~r-^""" r-n------</ns><bc> YU.001300</bc><co cc="Curtiss"> Curtisb, </co><hdlc cc="North American Plants"> North
American Pl</hdlc><ns>C^o.nr r^-nANTS,</ns><cnl> No.</cnl><cn> 503*</cn><ns> "^</ns><gn> Polygala</gn><sp> ambigna,</sp><sa> Nntt.,</sa><val> var.</val><hb> Coral soil,</hb><lc> Cudjoe Key, South Florida.</lc><col> Legit</col><co> A. H. Curtiss.</co></labeldata>
![Page 35: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/35.jpg)
Supervised Learning Framework
Gold ClassifiedLabels
Training Phase
Application Phase
MachineLearner
Trained Model
UnclassifiedLabels
Segmented Text
Silver Classified
Labels
Segmentation Machine Classifier
Unclassified Labels
HumanEditing
![Page 36: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/36.jpg)
Herbis Experimental Data
295 marked up records74 label states5-fold cross-validation
![Page 37: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/37.jpg)
Performances of NB and HMM
Performances of NB and HMM
0%
20%
40%
60%
80%
100%
bc
bt
cd cdl
cm cml
cn cnl
co col
ct dtl
fm fml
gn
hb
hbl
hdlc
in latlo
n
lc lcl
pd
sa snl
sp
Elements
F-Sco
re
NB HMM
![Page 38: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/38.jpg)
Element Identifiers
![Page 39: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/39.jpg)
Improved Performance With Field Element Identifiers
Improved Performance With FEI Encoding
-30%
-20%
-10%
0%
10%
20%
30%
40%
50%
60%
70%
bt
bc
cd
cm
cn
co
ct
dt
fm gn
hb
in hd
hd
lcsp
lc ot
pd
sa
va
alt
db
ns
tc sc
rgn
rsp
rsa
rdd
trinrd
pp
inp
tverb
pp
rep
pp
ers
pd
toin
thd
dtd
latlo
np
b
Elements
F-S
core D
ifferen
ce
![Page 40: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/40.jpg)
![Page 41: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/41.jpg)
Learning w/ pre categorization
GoldLabels
MachineLearner
Modeln
UnclassifiedLabels
ClassifiedLabels
Class 1Labels
Categor-ization
Class 2Labels
Class nLabels
MachineLearner
MachineLearner
Model2
Model1
Class 1Labels
Categor-ization
Class 2Labels
Class nLabels
MachineClassification
MachineClassification
MachineClassification
ClassifiedLabels
ClassifiedLabels
![Page 42: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/42.jpg)
FIG. 5. Improved Performance of Specialist Model
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 2 3 4 5 6 7 8 9 10
Iteration Number
F-S
core
Specialist Model(10+) Generic Model(10+) Generic Mean(200)
Specialist100 Curtiss VS 100 General
![Page 43: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/43.jpg)
P. Bryan Heidorn1, Hong Zhang1, Eugene Chung2 and BGWG
1Graduate School of Library and Information Science, 2Linguistics, University of Illinois
Machine Learning in BioGeomancer’s Locality Specification
SPNHC & NSCA 2006
![Page 44: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/44.jpg)
BioGeomancer Working Group (BGWG) http://203.202.1.217/bgwebsite/index.html
Worldwide collaboration of natural history and geospatial data experts
Maximize the quality and quantity of biodiversity data that can be mapped
Support of scientific research, planning, conservation, and management
Promotes discussion, manages geospatial data and data standards, and develops software tools in support of this mission
![Page 45: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/45.jpg)
Participants
![Page 46: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/46.jpg)
Example Locality Types
Record #
Specification of Location Locality Type
43 dario 7 mi wnw of; RIO VIEJO FOH; F
86 near Aleutian Islands; S of Amukta Pass NF; FH
100 INDIAN CREEK, 11 MI. W HWY 160 P; POH
109 TIESMA RD, 1.5 MI NW EDGEWATER; OFF LAKE MICHIGAN R
P; FOH; NP
160 WALTMAN, 9 MI N, 2.5 MI W OF FOO
181 0.4 mi N Collinston on LA 138 FPOH
204 Seward Peninsula; vic. Bluff, S coast F; NF; FS
![Page 47: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/47.jpg)
![Page 48: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/48.jpg)
JOH : offset from a junction at heading e.g. 0.5 mi. W Sandhill and Hagadorn Roads [ FEATURE [ CITY = Sandhill ]] [ FEATURE [ ROAD= Hagadorn Roads ]] OFFSET VALUE = 0.5
DIRECTION= W
UNIT = mile
JUNCITON [ FEATURE [ CITY = Sandhill ]]
[ FEATURE [ ROAD= Hagadorn Roads ]]
FRAME
![Page 49: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/49.jpg)
Xiaoya Tang and P. Bryan Heidorn
Xiaoya Tang and P. Bryan Heidorn
Different vocabularies in queries and documentsDifferent vocabularies in queries and documents
Long leaves
…... Leaves 20–75, many-ranked, spreading and recurved, not twisted, gray-green (rarely variegated with linear cream stripes), to 1 m 1.5–3.5 cm, ……... Inflorescences: ……. spikes very laxly 6–11-flowered, erect to spreading, 2–3-pinnate, …….
User query Description of leaf Length in texts
![Page 50: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/50.jpg)
Templates for useful information
Information Extraction From FNAInformation Extraction From FNA
Extraction
Rules
Structured information
User log analysis
Leaf_ShapeLeaf_MarginLeaf_Apex Leaf_BaseBlade_Dimension…..…..
Leaf_Shape obovateLeaf_Shape orbiculateBlade_Dimension 3—9 x 3—8 cm …………..…………..
Original documents
………..
Leaf blade obovate to nearly orbiculate, 3--9 × 3--8 cm, leathery, base obtuse to broadly cuneate, margins flat, coarsely and often
irregularly doubly serrate to nearly dentate, . ………………
Knowledge bases
…..PartBlade:Leaf bladeBladesblade
……
Pattern:: * <PartBlade> ' ' <leafShape> * ( <leafShape> ) ',' * Output:: leaf {leafShape $1}Pattern:: * <PartBlade> * ', ' ( <Range> ' ' * <LengUnit> ) * <PartBase>Output:: leaf {bladeDimension $1}
![Page 51: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/51.jpg)
Results – System Performance
Results – System Performance
Group NT NTH TSR SSR NSST TST NDVST
SEARFA 6.75 8.078 0.860 0.210 4.779 338.8 11.16
SEARF 4.50 3.598 0.568 0.053 9.584 435.2 14.75
Sig.(ANOVA) 0.005 0.005 0.000 0.011 0.000 0.72 0.162
NT: number of tasks accomplished in total
NTH: number of tasks accomplished per hour
TSR: task success rate
SSR: search success rate
NSST: number of searches to accomplish a task
TST: time spent to accomplish a task
NDVST: number of documents viewed to accomplish a task
![Page 52: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/52.jpg)
Education Programs
Biological Information Specialist
Concentration in Data Curation (MSLIS)
Certificate of Advanced Study in Data Curation
Information and professional education in biodiversity informatics
![Page 53: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/53.jpg)
Biological Information SpecialistsBiological Information Specialists
At present:
Biologists at all degree levels self-trained in information technology
Information technologists at all degree levels self-trained in biology
(both with gaps in knowledge for many months, years)
Differing roles of BIS in large and small
At present:
Biologists at all degree levels self-trained in information technology
Information technologists at all degree levels self-trained in biology
(both with gaps in knowledge for many months, years)
Differing roles of BIS in large and small
![Page 54: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/54.jpg)
Master of Science in Biological Informatics
Master of Science in Biological Informatics
Degree Program began September 2007
Part of campus-wide bioinformatics masters program
NSF/CISE/IIS, Education Research and Curriculum Development, 0534567 (Palmer, PI)
Combines Biology, Bioinformatics, Computer Science core with LIS courses
Degree Program began September 2007
Part of campus-wide bioinformatics masters program
NSF/CISE/IIS, Education Research and Curriculum Development, 0534567 (Palmer, PI)
Combines Biology, Bioinformatics, Computer Science core with LIS courses
![Page 55: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/55.jpg)
What does a BIS need to know?What does a BIS need to know?
Biological training and interest in solving biological research problems
Information skills Evaluation and implementation of information
systems: user based assessment and continual quality improvement for the development of tools that work and are used.
Information acquisition, management, and dissemination: development of digital libraries, data archives, institutional repositories, and related tools.
Information organization and integration: ontology development, structuring information for optimal use and sharing, and standards development.
Biological training and interest in solving biological research problems
Information skills Evaluation and implementation of information
systems: user based assessment and continual quality improvement for the development of tools that work and are used.
Information acquisition, management, and dissemination: development of digital libraries, data archives, institutional repositories, and related tools.
Information organization and integration: ontology development, structuring information for optimal use and sharing, and standards development.
![Page 56: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/56.jpg)
UIUC bioinformatics core courseworkUIUC bioinformatics core coursework
Cross-disciplinary course distribution requirement
Bioinformatics: Computing in Molecular
BiologyAlgorithms in
BioinformaticsPrinciples of Systematics
Computer Science: AlgorithmsDatabase Systems
Biology:Human GeneticsIntroductory BiochemistryMacromolecular Modeling
Cross-disciplinary course distribution requirement
Bioinformatics: Computing in Molecular
BiologyAlgorithms in
BioinformaticsPrinciples of Systematics
Computer Science: AlgorithmsDatabase Systems
Biology:Human GeneticsIntroductory BiochemistryMacromolecular Modeling
![Page 57: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/57.jpg)
Sample of existing LIS coursesSample of existing LIS courses
Information Organization and Knowledge Representation
LIS 551 Interfaces to Information Systems
LIS 590DM Document Modeling LIS 590RO Representing and
Organizing Information Resources
LIS590ON Ontologies in Natural Science
Information Resources, Uses and users
LIS 503 Use and Users of Information
LIS 522 Information Sources in the Sciences
LIS 590TR Information Transfer and Collaboration in Science
Information Organization and Knowledge Representation
LIS 551 Interfaces to Information Systems
LIS 590DM Document Modeling LIS 590RO Representing and
Organizing Information Resources
LIS590ON Ontologies in Natural Science
Information Resources, Uses and users
LIS 503 Use and Users of Information
LIS 522 Information Sources in the Sciences
LIS 590TR Information Transfer and Collaboration in Science
Information Systems LIS 456 Information Storage
and Retrieval LIS 509 Building Digital
Libraries LIS 566 Architecture of
Network Information Systems LIS 590EP Electronic
Publishing
Disciplinary Focus LIS 530B Health Sciences
Information Services and Resources
LIS 590HI Healthcare Informatics (Healthcare Infrastructure)
LIS 590EI/BDI Ecological Informatics (Biodiversity Informatics)
Information Systems LIS 456 Information Storage
and Retrieval LIS 509 Building Digital
Libraries LIS 566 Architecture of
Network Information Systems LIS 590EP Electronic
Publishing
Disciplinary Focus LIS 530B Health Sciences
Information Services and Resources
LIS 590HI Healthcare Informatics (Healthcare Infrastructure)
LIS 590EI/BDI Ecological Informatics (Biodiversity Informatics)
![Page 58: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/58.jpg)
MSLIS Data Curation Concentration
Data Curation Educational Program (DCEP)
IMLS – Laura Bush 21st Century Librarian Program,
RE-05-06-0036-06 (Heidorn, PI)
Students with the DC concentration will be trained to add value to data and promote sharing across labs and disciplinary specializations
![Page 59: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/59.jpg)
New research directionsNew research directions
Focus on integration and scale
Informatics infrastructure as competitive edge
Sample areas of development
Landinformatics GroupAtmospheric science, hydrology, nutrient balance, carbon
cycle, ecology, agronomy
BREC Focus on data integration problems across
larger range of sciences
Focus on integration and scale
Informatics infrastructure as competitive edge
Sample areas of development
Landinformatics GroupAtmospheric science, hydrology, nutrient balance, carbon
cycle, ecology, agronomy
BREC Focus on data integration problems across
larger range of sciences
![Page 60: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/60.jpg)
Example Service
JRS Biodiversity FoundationNational Science FoundationTaxonomic Database Working Group
![Page 61: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/61.jpg)
JRS Biodiversity Foundation
History: The J.R.S. Biodiversity Foundation was created in January 2004 when the nonprofit publishing company, BIOSIS was sold to Thomson Scientific. The proceeds from that sale were applied to fund an endowment and create a new grant-making foundation.
Mission: The Foundation defined a mission within the field of biodiversity: To enhance knowledge and promote the understanding of biological diversity for the benefit and sustainability of life on earth.
JRS Biodiversity Foundation
![Page 62: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/62.jpg)
JRS Biodiversity Foundation
Scope: To further advance the Foundation’s mission a scope was developed as: Interdisciplinary activities primarily carried out via collaborations in developing countries and economies in transition. The Foundation Board of Trustees has expressed a particular interest in focusing its grant-making in Africa.
Strategic Interest: Within those bounds a considered course has been chosen to: Advance projects, or parts of biodiversity projects that focus on: (1) collecting data, (2) aggregating, synthesizing, publishing data, and making it more widely available to potential end users, and (3) interpreting and gaining insight from data to inform policy-makers
![Page 63: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/63.jpg)
QuickTime™ and aMPEG-4 Video decompressor
are needed to see this picture.
Grant Making: about $2M/yr Animal Tracking in South Africa Specimen Digitization in Ghana Social Value of Conservation in Peru Species Pages and BD Education in Costa Rica Niche Modeling in Brazil Travel Grants Lake Victoria Data Library Project in Tanzania, Uganda
and Kenya e-Biosphere ‘09
JRS Biodiversity Foundation
![Page 64: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/64.jpg)
National Science Foundation
Advances in Biological InformaticsData Working GroupPlant Science Cyberinfrastructure Center
(iPlant)Cyber-enabled Discovery and InnovationHiring CommitteesDivision of Biological Infrastructure
Planning
![Page 65: Dark Data In the Long Tail of Science: Examples in Biology](https://reader035.vdocument.in/reader035/viewer/2022070316/5559fde6d8b42aa8098b4c66/html5/thumbnails/65.jpg)
Taxonomic Database Working Group
Structure of Descriptive DataEducation InitiativeHERBISTaxonomic Name Identification