t2d data identification curation duration
Post on 04-Apr-2022
9 Views
Preview:
TRANSCRIPT
TABLE TO DATA (T2D) PROJECT
Approved March/08 at the
COPPUL director’s meeting as a
collaborative project seeking to
implement a system of linking
articles & data in open access
journals published at COPPUL
institutions.
T2D ACTIVITIES TO DATE
May/08: Brainstorming at IASSIST conference
July/08: Drupal Wiki established & “Outline of Activities” disseminated to project members
Fall/08: Maxine undertook a Literature Search (building on work done by Jim Jacobs, Feb/08)
December/08: Maxine reported at ACCOLEDS and renewed effort to involve project members
Spring/09: Maxine investigated related project topics in connection with Study Leave research
Additionally, Chuck liaised/advocated for the project throughout the timeline & consultation with OA publishers was undertaken by some project members.
T2D PROJECT STAGES
1. Investigating
Literature Searches re: background, tools, etc.
2. Recruiting
Open access publishers amenable to a pilot project
Researchers willing to deposit data
3. Marking
Develop a set of descriptive tags for table content
Identify which parts of a data file “should” be
linked and/or archived
4. Tooling (i.e., tools for markup, searching & display)
5. Evaluating/Reporting (i.e., HOW the project
results contribute to research, teaching & learning)
SO … WHAT IS IN IT FOR US?
This seemed like a reasonable question to
investigate further in the research in terms of
“background information”.
TAKING INTO ACCOUNT RESEARCHERS’ DISCIPLINARY
DIFFERENCES, TABLES/FIGURES ARE INCREASINGLY:
used as a more effective summary of the
article’s content than subject headings or
other descriptors
used as a quick means of identifying types
of data, methodologies &/or results
used to assess article relevance before
reading the entire article
less effective if completely extracted from
the surrounding explanatory text and/or
complementary tables/figures
DISAGGREGATION
Disaggregation of article components such as
tables/figures facilitates searching at a
greater level of granularity in order to:
Improve search precision (# of relevant
items) & recall (# of tables/figures not
otherwise retrieved in a traditional search)
Facilitate the REAGGREGATION of a journal
article’s components into new
forms/formats
REAGGREGATION?
Researchers wish to easily incorporate
tabular information:
into new documents (to support original
research)
into multimedia documents (to support
presentations - classrooms or conferences)
into other contexts (utilize data in pre-
existing tables rather than generate new
time-consuming and/or expensive datasets)
into a comparison of similar information (to
check one’s own work against other work)
SO … WHAT CAN MAKE IT
EASIER TO RETRIEVE RELEVANT
TABLES/FIGURES?
The research was decidedly sparse in this area or
not quite as “on-topic” as one would have hoped.
OVERVIEW OF LITERATURE REVIEW
The research mostly dealt with such topics as:
Making T&F (tables/figures) more
accessible to the visually impaired.
Improved graphical presentation of T&F.
Poor quality of T&F replication in
electronic versions of documents.
Improved dissemination of statistical
information.
Full-text does not necessarily mean the
inclusion of T&F.
FORMAT-SPECIFIC DATABASES
TableBase (Gage; 1997+)
table title, table text, and descriptor fields are
searchable
text that accompanies the table is not
searchable or retrievable from the product
tables are directly downloadable to Excel
Statistical Universe (Lexis-Nexis
PowerTables; 2000+)
users search by “criteria”
links to full-text documents in the CIS/LEXIS-
NEXIS digital archive & on WWW sites
download a PDF file or an Excel spreadsheet
DATABASES WITH
“DEEP INDEXING” FEATURES
Illustrata (ProQuest/CSA; 2006+)
assigns 7-8 index terms per image (these are
searchable but not the table text itself)
thumbnail images for quick preview
links to full-text and other components within
the product
Selected ProQuest Databases (Oct. 1, 2009+)
deep indexing of images added along with
traditional abstracting & indexing of text (at no
additional cost)
PRODUCTS THAT INDEX
TABLE CONTENT
TableSeer (search engine; 2006+)
automatically identifies tables in digital
documents and extracts the contents in the
cells of the tables
contents are stored in a queryable table in a
database which extracts table metadata and
uses a novel ranking function to search for
tables relevant to user queries
BioText Search Engine (freely available
web-based application; 2007+)
searches over 300 open access journals
ability to search for words within a table
SO … WHAT DOES THIS ALL
MEAN FOR THE T2D PROJECT?
Not exactly sure but perhaps, in seeing this trend
in the Abstract & Indexing industry, we might
investigate developing a “SocioText” type of
product to index open access journals such as the
Canadian Journal of Sociology = ??
SO … WHAT ELSE NEEDS TO BE
“PUT ON THE TABLE”?
What if the table information is insufficient and
I want to look at entire dataset?
Where is the entire dataset?
Who owns the entire dataset?
When will it become available for me to use?
How can I get my hands on it?
IDENTIFIC/CUR/DUR-ATION!
Personal Websites
Institutional Repositories
Subject-specific Repositories such as:
Dryad - http://datadryad.org/repo
ExLab - http://exlab.bus.ucf.edu
AND THEN PERHAPS, there’s still:
Desk Drawers (aka: LOST)
top related